DOCUMENT RESUME 



ED 363 625 TM 020 623 

TITLE Advances in Education Research, Volume 1, No. l. f 

Summer 1993. 

INSTITUTION Office of Educational Research and Improvement (ED), 

Washington, DC, Office of Research. 
REPORT NO OR-93-3108 
PUB DATE 93 
NOTE 296p. 

PUB TYPE Collected Works - General (020) — Collected Works - 

Serials (022) 

JOURNAL CIT Advances in Education Research; vl nl Sum 1993 

EDRS PRICE MF01/PC12 Plus Postage. 

DESCRIPTORS Academic Achievement; "Adolescents; -Children; 

^Community Involvement; Compensatory Education; 

'^Disadvantaged Youth; Educational Change; 

^Educationally Disadvantaged; Educational Policy; 

Educational Practices; ''Educational Research; 

Elementary Secondary Education; Program Improvement; 

Remedial Instruction; Scholarly Journals; Track 

System (Education); Urban Schools 
IDENTIFIERS -Policy Issues 

ABSTRACT 

Thirteen previously published articles from selected 
journals are presented, which concentrate on educationally 
disadvantaged children and youth, and are grouped under the four 
general themes or categories of children and youth, school practices, 
community involvement, and policy issues related to the Chapter 1 
program. All the articles have been cited either in "Current Index to 
Journals in Education (CUE)" or in "Resources in Education (RIE) . " 
Their locater numbers starting with EJ or ED are included in this 
abstract. The following articles are included: (1) "Demographic 
Disparities of Inner-City Eighth Graders" (Samuel S. Peng, Margaret 
C. Wang, and Herbert J. Walberg) (EJ440492) ; (2) "Educational Levels 
of Adolescent Childbearers at First and Second Births" (Diane 
Scott-Jones) (EJ436971) ; (3) "Explaining Wi thin-Semester Changes in 
Student Effort in Junior High School and Senior High School Courses" 
(Douglas J. Mac Iver, Deborah J, Stipek, and Denise H. Daniels) 
(EJ436880) ; (4) "Preventing Early Reading Failure with One-to-0ne 
Tutoring: A Review of Five Programs" (Barbara A. Wasik and Robert E. 
Slavin) (ED324122) ; (5) "Responsive Practices in the Middle Grades: 
Teacher Teams, Advisory Groups, Remedial Instruction, and School 
Transition Programs" (Douglas J. Mac Iver and Joyce L. Epstein) 
(EJ436976) ; (6) "School Competency Testi'ng Reforms and Student 
Achievement: Exploring a National Perspective" (Linda F. Winfield) 
(EJ415877) ; (7) "Achievement Effects of Ability Grouping in Secondary 
Schools: A Best-Evidence Synthesis" (Robert E. Slavin) (EJ417571 anil 
ED322565) ; (8) "The Variable Effects of High School Tracking" (Adam 
Gamoran) (EJ456685) ; (9) "Community Involvement and Disadvantaged 
Students: A Review" (Saundra Murray Nettles) (EJ436841) ; (10) "Using 
Community Adults as Advocates or Mentors for At-Risk Middle School 
Students: A Two-Year Evaluation of Project RAISE" (James A. 
McPartland and Saundra Murray Nettles) (EJ436975 and ED337536) ; (11) 
"Lessons from the Field: Case Studies of Evolving Schoolwide 
Projects" (Linda F. Winfield) (EJ438594) ; (12) "Modifying Chapter 1 
Program Improvement Guidelines To Reward Appropriate Practices" 
(Robert E. Slavin and Nancy A. Madden) (EJ438596) ; and (13) "Chapter^ 
1 Program Improvement: Cause for Cautious Optimism and a Call for 
Much More Research" (Sam Stringfield, Shelley H. Billig, and Alan 
Davis) (EJ4386G0) . References accompany each article. (SLD) 



Ad 



vances 



in 



Education 
Research 



Office of Research 
Office of Educational Research and Improvement 



3 



U.S. Department of Education 

Richard W. Riley 
Secretary 

Office of Educational Research and Improvement 

Sharon P. Robinson 
Assistant Secretary 

Office of Research 

Joseph C, Conaty 
Acting Director 



Summer 1993 



The Office of Research has obtained permission from 
the copyright holders to reproduce certain quoted 
material in this report. Further reproduction of this 
material is prohibited without specific permission of the 
copyright holders. All other material contained in this 
report is in the public domain and may be used and 
reprinted without special permission; citation as to 
source, however, is expected. 



ERLC 



4 



mxmwmxmmm wmmm m mmmmm mmm< mm w-. . . m m ; \ : ^wmm m^mm 

W&mm-. mmmm :, ■ . :>> : : ; : w. : : : ; . ; : mm-. . : : > . : , +m ; ' x 'v < m \ ■ m : ■ : mm ■ ; mmmmmmmmi m 

Foreword 

'■mmmmmmmmm---, -m, my/.- \ ] ; •.'•> : • C;, ; / ■■ : ; x ;;: : >x ; >:<:■:■: : : y --;,/, : : ' x •' : ; : : ; M': :■ 'aS : •: '• : M mmmmy^mmmmmX 



Advances in Education Research makes available to the public peer- 
reviewed, scholarly research supported in whole or in part by the Office of Educa- 
tional Research and Improvement through its educational research and 
development programs. The goals of Advances in Education Research are to: 
bring together from diverse scholarly sources first-rate, exemplary research that 
relates to an important educational theme or topic; disseminate the results of 
funded research more widely to researchers, educators, and policymakers; serve 
as a forum for discussing, debating, and exchanging research results and perspec- 
tives of researchers and education practitioners; and increase public awareness 
of, access to, and use of high quality education research that is central and indis- 
pensable to improving and strengthening American education. 



Advances in Education Research is produced by the Office of Research of the Of- 
fice of Educational Research and Improvement. The mission of the Office of Re- 
search is to support research that helps to ensure equal access to education and to 
promote educational excellence throughout the nation. The aim is to provide to 
the American public the best available research-based information about every 
level of education. These goals are accomplished through basic and applied re- 
search carried out by the Office of Research, and by universities, school districts, 
teachers, and individuals across the nation. Accordingly, three questions guide the 
work the Office of Research supports and does: Is an important problem being ad- 
dressed? Will greater knowledge or understanding come from the results of this 
work? What utility or benefit will this work have for education? For further infor- 
mation on Advances in Education Research, and on the Office of Research and 
the work it supports, or to contribute articles, please write or call: 



Office of Research 
555 New Jersey Avenue N W 
Washington, DC 20208-5573 
Voice: 202-219-2079 

Fax: 202-219-2030 



This firs volume at Advances in Education Research includes previously publish- 
ed articles (rem selected rcferecd journals which are briefly described at the end 
of this volume. The articles are reproduced with the permission of the authors 
and the journals in which they originally appeared. They were written by one or 
more individuals affiliated with the Office of Research's university-based Nation- 
al Educational Research and Development Centers. However, the views ex- 
pressed in these articles arc those of the authors and do not necessarily reflect the 
position or policy of the Office of Educational Research and Improvement or the 
Office of Research, and no official endorsement should be inferred. 
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This volume presents selected research articles related to issues 
on educationally disadvantaged children and youth. Each article 
in this volume was previously publisfied in a refereed journal. 
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Perhaps the most urgent and compelling issue facing American education Ls 
how best to improve and strengthen the quality of education and performance of 
educationally disadvantaged students — students who confront multiple kinds of 
problems which interfere with and impede their success in school, and which are 
frequently beyond their control. The articles in this issue of Advances in Educa- 
tion Research address and focus on some of the most important problems and cir- 
cumstances ermane to educationally disadvantaged Children and youth. These 
articles cover distinct — though interrelated — themes, topics, and levels of educa- 
tion. They reflect an interdisciplinary approach to research on educationally disad- 
vantaged students. They also represent various conceptual, methodological, and 
analytical approaches. And, they demonstrate the kinds of significant contribu- 
tions research can and does make to preventing and overcoming problems and 
barriers faced by educationally disadvantaged students. 

The 13 articles included in this first volume arc grouped under four general 
themes or categories: (1) children and youth; (2) school practices; (3) community 
involvement; and (4) policy issues related to Chapter 1. 



Children and Youth 



Within this broad category Peng, Wang, and Walbcrg {Demographic Disparities 
of Inner-City Eighth Graders) give us a context within which the backgrounds of 
inner-city students can be better understood. Based or, on analysis of the National 
Center for Education Statistics 4 National Education Longitudinal Study of 1988 
(NELS:8S), Peng, Wang, and Walbcrg provide a demographic and socioeconomic 
profile of middle-grade students, comparing those who are enrolled in inner-city 
schools with those who are enrolled in schools in other types of communities. 
Peng and his colleagues show that children attending inner-city schools arc quite 
different from those children attending schools in suburban or rural communities. 
For example, the vast majority of children in inner-city schools are African 
American and Hispanic, they do not live with both natural parents, and they live 
in poverty. 

The results of Peng ct al. s analysis underscore the importance for educators of 
knowing and understanding more than has been traditionally required. Effective 
approaches to teaching and learning of inner-city youth must reflect an awareness 
and appreciation of students' backgrounds, readiness, motivations, interests, and 
developmental skills. Educators will need to have a keen sense not only of the 
academic strengths and weaknesses of inncr-cily children, but also of their cul- 
tures and family backgrounds. These findings also imply that schools alone can- 
not correct many of the problems affecting the education of inner-city children. 
That is, schools must involve family members and must work with community 
health and social service agencies to prevent and solve many of the problems con- 
fronting inner-city youth. 
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Diane Scolt-Joncs (Educational Levels of Adolescent Childbearers at First and 
Second Births) demonstrates that the patterns of ehildbcaring and educational at- 
tainment among white, black, and Hispanic adolescents, 15-19 years of age, vary 
significantly. For instance, young Hispanic mothers have fewer years of school- 
ing than do their black and white counterparts. And, compared to white mothers 
black adolescent mothers generally complete the same, or slightly more number 
of grades. 

Scott-Jones* analysis also shows that the number of school years completed by 
adolescent mothers in the 15-19 age group is, on average, lower than that of the 
national median age group. This is not true in all cases, particularly for 15-ycar- 
olds, and for 15-16-ycar-oId black mothers. The median numbcrof years of 
schooling completed by 15-year-old mothers is basically the same as the national 
median. Further, the median educational level of 15- and 16-ycar-old black 
mothers is higher than that of the national cohort. 

In addition, Scott-Jones' study shows that the educational levels of fathers and 
mothers in the 15-19 age cohort are positively correlated within each racial/eth- 
nic group. The educational implications of Scott-Jones* research for adolescent 
pregnancy are significant and far reaching. It suggests the importance of estab- 
lishing effective school-based policies and programs that (a) deter premature 
sexual activity and unplanned pregnancies, (b) promote the educational progress 
of adolescents who become pregnant, and (c) prevent students from dropping out 
of school because of early parenthood. 



The last article included in the "Children and Youth" section looks at another im- 
portant issue — student effort. Mac Iver, Stipek, and Daniels (Explaining Within- 
Semester Changes in Student Effort in Junior High School and Senior High 
School Courses) make the point that regardless of the course taken, as the 
semester progresses some students lose interest and reduce their effort and other 
students do better and try harder. Why? What factors account for within-scmcster 
changes in student effort? The authors state: "Virtually every theory of motiva- 
tion suggests that changes in ability perceptions partially determine changes in ef- 
fort. Researchers have also cited changes in students* valuing of the course and 
changes in extrinsic pressures as determinants of effort changes." 

In study ing junior and senior high school students, Mac Iver, Stipek, and Daniels 
find that changes in students' perceived abilities (in a subject) directly affect their 
effort and the value they place on a particular subject matter. These results are 
"consistent with the claim that, by reducing the number of students who believe 
they are 'not good* in a subject, teachers can increase the number of students who 
work near their potential." 

The work of Mac Iver and his colleagues suggests that a number of strategic 
changes may be required to improve and strengthen students* motivation and per- 
formance. For example, if principals and teachers want to raise students' con- 
fidence in their abilities in order to boost their classroom effort, then they may be 
required to make specific changes in curriculum and instruction, Uisk structures, 
ability-grouping policies, and student evaluation practices. 
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School Practices 



What evidence exisLs on the success or effectiveness of one-to-one tutorial 
programs? Are some programs more effective than others? Wasik and Slavin 
(Preventing Early Reading Failure With One-to-One Tutoring: A Review of Five 
Programs), using a "best-evidence synthesis" of 16 evaluation studies, review 
and compare the effectiveness of five one-to-one tutorial reading programs that 
have been used to improve the reading skills of first-graders who are at risk for 
reading failure. 

Their review Incuses on: (1) Reading Recovery > a preventive tutoring program 
developed in New Zealand and widely used in the U.S.; (2) Success for All, a 
comprehensive sehoolwidc program with a major one-to-one tutoring component 
for primary grade students; (3) Prevention of Learning Disabilities, a program 
based on a physiological view of learning and learning disorders; (4) Wallach 
Tutoring Program, a program targeted to first-graders in which paraprofessional 
tutors are used; and (5) Programmed Tutorial Reading, a highly structured first 
grade reading program using paraprofessionals, volunteers, or parents. 

Despite the many differences among these programs— including the extent of 
their effectiveness— overall Wasik and Slaving analysis shows substantial posi- 
tive lcsulls of one-to-one tutoring compared to the results of traditional methods. 
Further, the effects of tutoring are generally lasting, Tutorial reading programs 
are most effective when they include many — instead of few — components of the 
reading process, when they emphasize the content of the reading program in addi- 
tion to the delivery style (i.e., one-to-one tutoring), and when they use certified 
teachers rather than paraprofessionals, The authors also suggest that tutoring 
programs, although costly, appear to be more effective than other types of expen- 
sive intervention strategics (e.g., reduced class size) currently in use. 

Besides offering many interesting findings and results, Wasik and Slavin 's work 
raises questions and issues thai policymakers should consider with respect to 
designing, implementing, and maintaining tutorial programs for educationally dis- 
advantaged children. For example, how should educators decide on which tutor- 
ing program to use for children who are at risk for school failure? What type of 
cost/benefit formula should educators apply? What must educators do to ensure 
that the tutorial programs selected will have "sustaining effects"? And how much 
arc we prepared to spend to achieve these results? 



Junior high schools, or more accurately the middle-grade schools, are both major 
socializing institutions and critical academic "turning points" in the lives of 
America's young adolescents. Mac Ivcrsmd Epstein (Responsive Practices in the 
Middle Grades: Teacher Teams, Advisory Groups, Remedial Instruction, and 
School Transition Programs) examine the use and principals 1 perceived effects of 
(a) interdisciplinary teacher teams, (b) homeroom or group advisory periods, (c) 
remedial instruction programs, and (d) school transition programs— school prac- 
tices that many educators believe respond to the needs of young adolescents. The 
authors base their analysis on daUi collected from a national sample of principals 
in public schools with grade seven. Are there substantial benefits to a school and 
its students if schools have group advisory periods, establish interdisciplinary 
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teams, offer remedial instruction, and conduct programs tor students* smooth tran- 
sition to and from the middle grades? 

Generally, the results of Mac Ivcrand Epstein's analysis suggest that most of the 
school practices they studied produce modest benefits, Mac Iver and Epstein also 
find that principals report a stronger school program overall when they invest 
heavily in interdisciplinary teacher learns to create supportive conditions for stu- 
dents and teachers. In addition, principals expect fewer students to leave high 
school before graduating when the school uses supportive advisory group ac- 
tivities or responsive remediation programs. Another finding from Mac Iver and 
Epstein's study is that extensive school transition programs reduce the number of 
students who have to repeat the grade immediately following the transition. 

Based on their data, Mac Iver and Epstein predict that if schools conduct ar- 
ranged gioup advisory activities weekly — as opposed to infrequently — then these 
schools may prevent 1 percent of their students from dropping out of high school 
before they graduate. They also predict that when schools provide an extra sub- 
ject period during the school day to students who need coaching or remedial help, 
then these schools are likely to reduce their dropout rates by more than 1 percent. 

The above results, as well as others from the Mac Iver and Epstein study, have 
important implications for the improvement of education in the middle grades. 
They suggest, for inslanec, that schools must make sure that responsive practices 
arc implemented properly. They also imply that not all practices are equally 
beneficial, that different practices may require different implementation 
strategies, and that the best way to realize the full benefits of a practice may be to 
combine or mix it with some other praclice(s). 



Linda F\ Winfields article (School Competency Testing Reforms and Student 
Achievement: Exploring a National Perspective) tackles competency testing, 
another important school practice. Over the last two decades, a number of educa- 
tion reform initiatives have been designed to increase accountability and to im- 
prove student achievement outcomes. As Winfield indicates, attempts to improve 
student performance included "increasing graduation requirements and im- 
plementing assessment programs that define both standards of performance for 
students and standards of accountability for the educational system." Increasing- 
ly, stales and local school districts have used minimum competency tests (MCT) 
as a principal means to achieve reform. 

Win field's exploratory sludy examines "the relationship between school-level 
minimum competency testing (MCT) programs and student reading proficiency 
as measured by the 19«S3-1^S4 National Assessment of Educational Progress 
(NAEP)." Winfield compares student-level proficiency outcomes for whites, 
blacks, and Hispanics after adjusting for selected individual and school-level dif- 
ferences for the 4lh-, Slh-, and 1 llh-gradc NAEP samples, 

Do MCT programs make a difference? Does student proficiency improve as a 
result of these programs? The results of WinficUTs investigation generally show a 
"higher level of reading proficiency lor students in grades 8 and 11 attending 
schools with MCT programs compared with their counterparts in schools without 
such programs." MCT programs, however, seem to have no effect for grade 4 stu- 
dents. 
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WinficIcTs work brings forward imporUmt policy issues and questions, including: 
how and under what conditions should educators use testing to improve perfor- 
mance and reform schools? What care and caution must be taken when using 
these kinds of tests? How should N AEP, or any other type of national assessment 
effort, be used by educators to make or influence policy? What guidelines should 
there be on the use of NAEP as an instrument for school reform? Who should es- 
tablish these "rules of use"? How should they be enforced? These are only a few 
of the policy concerns that Winficld's study raises. 



Ability grouping has remained as one of the most controversial Issues in educa- 
tion. Against this backdrop, Slavin (Achievement Effects of Ability Grouping in 
Secondary Schools: A Best-Evidence Synthesis) presents a comprehensive review 
of research that has evaluated the effects of ability grouping on student achieve- 
ment in secondary schools. Slavin reviews 6 randomized experiments, 9 matched 
experiments, and 14 correlational studies that compared ability grouping to 
heterogeneous plans covering periods of from one semester to five years. Ability 
grouping is "any school or classroom organization plan that is intended to reduce 
the heterogeneity of instructional groups; in between-class ability grouping the 
heterogeneity of each class for a given subject is reduced, and in within-class 
grouping the heterogeneity of groups within the class (e.g., reading groups) is 
reduced," 

Slavin's "best-evidence synthesis" review indicates that the effects of ability 
grouping on student achievement — as measured by standardized tests — are essen- 
tially zero at all grade levels. Slavin also concludes that (1) various models of 
ability grouping are equally ineffective, (2) ability grouping is equally ineffective 
in all subjects, notwithstanding the possible negative effect of ability grouping in 
social studies, and (3) there are no consistent positive or negative effects on stu- 
dents of high, average, or low ability who arc assigned to different levels of the 
same course. 

The results of Slavics analysis pose a fundamental policy question many 
educators must confront and resolve: What justifies any form of ability grouping 
when the evidence shows that ability grouping has very little — if any — effect on 
student achievement? 



Unlike Slavin, Adam Gamoran (The Variable Effects of High School Tracking) 
focuses his analysis on variation among types of tracking, not on the presence or 
absence of tracking or other types of ability-grouping practices. Gamoran states 
that the "effects of tracking in high schools depend in part on the way tracking is 
organized: To the extent thai the structure of tracking varies across schools, 
tracking s impact on achievement also varies." Using data from the National Cen- 
ter for Education Statistics' High School and Beyond (HS&B),a national survey 
of high schools and their students, Gamoran examines four structural charac- 
teristics of Iracking systems: (1 ) "selectivity" — the degree of homogeneity within 
tracks; (2) "declivity" — whclhcr students choose or arc assigned to track posi- 
tions; (3) "indusiveness" — the subsequent educational opportunities available to 
students; and (4) "scope" — the breadth and flexibility of track assignments. 
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The results of Gamoran's analysis point to the significant differences among 
schools in the magnitude of track effects on mathematics achievement, and in net 
average achievement on both verbal and mathematics tests. Schools wi'.h more 
mobility in their tracking systems (i.e., allowing student movement from one 
track position to another) produce higher math achievement overall. They also 
have smaller gaps between tracks in both math and verbal achievement when 
compared to schools with more rigid tracking systems. Moderately inclusive sys- 
tems (i.e., with relatively more students assigned to the college preparatory pro- 
gram) also have less betwecn-track inequality in math. Furthermore, overall 
school achievement tends to rise in both subjects as inclusiveness increases. 

With respect to differences between Catholic and public schools, Gamoran finds 
that Catholic schools have less inequality between tracks and higher productivity 
overall than do public schools, especially for math achievement. Gamoran at- 
tributes these Catholic school advantages partly to the way Catholic schools im- 
plement tracking. 

These findings, as well as others, arc outcomes of Gamoran's rich analysis. They 
show that the issue of tracking — and its effects — is detailed, varied, and complex. 
Gamoran's work raises a number of basic education policy questions. For in- 
stance, should there be any kind of tracking in schools? Or, should all forms of 
school tracking be eliminated? On what bases should decisions be made to track 
or not to track students? If tracking systems are to exist, what types of systems 
should schools implement? How should they be structured? How should students 
be assigned? How much homogeneity, flexibility, mobility, et cetera, should there 
be within and between tracks? CerUiinly, educators must consider and weigh the 
costs and benefits of tracking/not tracking, particularly in terms of maximizing 
overall academic achievement, and at the same time, maximizing access to learn- 
ing opportunities for all students. 



Community Involvement 



Community participation is seen as a critical component in efforts to help prevent 
and solve many of the problems of educationally disadvantaged students. 
Saundra Murray Nettles (Community Involvement and Disadvantaged Students: 
A Review) addresses the effects of community involvement on students who en- 
counter multiple barriers and difficulties to success in schools. The first part of 
Nettles' article (1) defines community involvement as "the actions that organiza- 
tions and individuals (e.g., parents, businesses, universities, social service agen- 
cies, and the media) Uike to promote student development"; and (2) describes and 
conceptualizes involvement in terms of four change processes — conversion, 
mobilization, allocation of resources, and instruction. 

Conversion is the process of bringing students from one belief, or behavioral 
stance, to another. Mobilization involves more active participation of people and 
organizations in the education process. Allocation refers to actions community en- 
tities take to provide resources to children and youth. Instruction includes ac- 
tivities to help students develop intellectually or learn the rules and values that 
govern social relationships in the community. 
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The second pari of Nellies' article examines Ihc effects of the four kinds of invol- 
vement noted above through a review of 13 evaluation studies of academic inter- 
vention programs with significant input from community entities. An example of 
the kinds of programs evaluated is the Chicago Area Project which provided ser- 
vices to targeted youth, undertook community improvement, and organized ac- 
tivities to prevent area delinquency. 

In general, Nettles shows from the studies reviewed that community involvement 
docs have positive effects on school -related behaviors and achievement, on stu- 
dent altitudes, and on risk-taking behavior. There are positive outcomes for 
scjiool attendance, pcisistcncc in school, pregnancy prevention, and attitudes 
toward school. The effects range from small to substantial overall. Nettles also 
indicates that the pattern of outcomes varies by community involvement type. 
There is an overall pattern of positive effects for programs that are classified as 
"allocation" or "instruction"; however, there is a mixed pattern for those 
programs that combine the two types. 

Nettles* work is important to educators and policymakers in a number of ways. 
At the very Ieasl, it demonstrates thai the general call for greater community in- 
volvement is well founded and worthy of support. It also reveals that the type and 
form of involvement may be Ihe most important factors in achieving success. 



Whereas the Nettles article above reviews evaluation studies across 13 programs, 
the McPartland and Nettles article (Using Community Adults as Advocates or 
Mentors for At-Risk Middle School Students: A Two-Year Evaluation of Project 
RAISE) examines and evaluates Ihc effects on selected student outcomes of a 
single project — Project RAISE, a mullifaccled approach featuring outside adults 
as school-based advocates and onc-on-onc mcnlors for at-risk students at seven 
middle schools. 

McPartland and Nettles find thai after 2 years in operation, Project RAISE has 
positive effects on improving student attendance and report card grades in 
English, but not on promotion rales or standardized test scores. The authors point 
out that the effects, though sizable, were not enough to neutralize the academic 
risks with which students entered the program. They find that the positive results 
were primarily accounted for by three of Ihc seven sites evaluated. 

McPartland and Nellies indicate llial "some evidence supported interpretations 
that, although slrong onc-on-onc mentoring is not an essential component of an 
effective program thai uses outside adults to assist at-risk middle school students, 
the RAISE model is much more likely to show positive effects when one-on-one 
mentoring has been strongly implemented." They also state that success may 
depend as well on the size and composition of the group of students served. 

The research work of McParlland and Nettles raises the following questions: 
what must educators do to (1) locate and recruit mentors, and (2) facilitate ongo- 
ing successful relationships between mentors and at-risk students? 
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Policy Issues: Chapter 1 



Since il was first cnnclcd in 1965, Chapter 1 (previously called Title I) has served 
as the cornerstone of ihe federal government's effort to help public schools meet 
the special educational needs of educationally disadvantaged children. Over the 
years, Chapter 1 has provided state and local school districts with about $80 bil- 
lion. Chapter 1 is the largest categorical federal elementary and secondary educa- 
tion program, and is targeted to improving the academic achievement of at-risk 
students. With this as a backdrop, each article in this section focuses on some 
aspccl(s) of this major federal program. Taken together, these articles cover a 
number of salient issues that should be considered by policymakers and educators 
as they continue their deliberations and discussions on making Chapter 1 more 
responsive to its intended beneficiaries — educationally disadvantaged school 
children. 



Win field's 2-year qualitative study (Lessons From the Field: Case Studies of 
Evolving Schoolwide Projects) describes changes that occurred in one of the 
nation's largest urban school systems following passage of the Hawkins-Stafford 
Amendments, which brought on what some consider to be the most sweeping 
changes in the history of Chapter 1 legislation. The Hawkins-Stafford Amend- 
ments, as Winficld indicates, allow schools to use Chapter 1 funds for school- 
wide projects (S WPs) when 75 percent or more of the students in these schools 
are economically disadvantaged. A major goal of the amendments is to upgrade 
and improve the entire school program and to minimize administrative and in- 
structional program fragmentation. Winficld uses a case study method to describe 
the central office and system role changes at the elementary school level at 11 
sites. 

Winficld points out that the school system's approach to schoolwide projects in- 
volves five main features: (a) a whole-school approach based on "effective 
schools" research; (b) a school-based management strategy requiring school staff 
and parent involvement; (c) an ongoing monitoring process to gauge individual 
student, class, and school performance; (d) a district-based support system at 
central and subdislrict offices to provide staff and parent training; and (e) a con- 
centration of resources so that funds beyond the minimum amounts would be 
committed from Chapter 1 and operating budgets. 

The results of Winfield's case study reveal that while schools use Chapter 1 funds 
in a number of ways, nearly all schools use funds to establish an additional teach- 
ing position to lower the teacher-student ratio during math and reading instruc- 
tion. The author concludes that schoolwide projects (SWPs) have the potential for 
improving the learning outcomes of educationally disadvantaged students. But 
this potential can be realized "only if adequate support for change is provided at 
the central office or district level and if sufficient resources are devoted to human 
resources and professional development." 

Winfield's study raises many important questions. For example, if schools are to 
change from offering a traditional Chapter 1 program to implementing a more in- 
tegrated one that focuses on all students, are district central offices prepared to 
make analogous changes? Are district offices committed to and prepared to make 
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the necessary structural and operational changes to expand the number of school- 
wide project schools in their district and to provide effective coordination and 
delivery of direct services to SWP schools? What will it lake — and at what cost — 
for districts to move away from traditional bureaucratic procedures to more 
flexible, responsive-oriented approaches? 



The article by Slavin and Madden (Modifying Chapter J Program Improvement 
Guidelines to Reward Appropriate Practices) looks at possible effects of Chapter 
1 assessment guidelines. Slavin and Madden discuss how new accountability 
guidePncs have helped educators to focus on the outcomes of Chapter 1 
programs. However, these guidelines may also result in rewarding counterproduc- 
tive practices. They may possibly hinder early intervention programs like pre- 
school, kindergarten, and first-grade programs that increase the baseline for later 
school performance gains. 

Moreover, Slavin and Madden say that these guidelines may reward student reten- 
tions, "which significantly increase normal curve equivalent (NCE) gains. They 
may also focus leaching on narrow, easily measured objectives*'. 

Their article offers a different approach to Chapter 1 accountability that rewards 
schools for reducing the number of students who fail to meet minimum standards 
on tests that arc relevant and broad-based, Students who are held back or untested 
would be counted as not meeting minimum standards. Services designed to im- 
prove programs would be expanded substantially and would be provided to all 
Chapter 1 schools. 

Slavin and Madden discuss the advantages and problems in using their proposed 
accountability approach. For instance, the authors note that "a school undergoing 
major demographic changes might appear to be declining in the percentage of stu- 
dents meeting minimum standards. This could be dealt with by allowing schools 
to submit demographic data (e.g., increases in the percentage of students qualify- 
ing for free lunch) to explain any declines." Or, "it may be unfair to hold schools 
fully responsible for students new to the school. This problem might be solved by 
counting only students in the school for at least two years." These arc but a few 
of the potential problems and solutions Slavin and Madden address in their dis- 
cussion on modifying Chapter 1 improvement guidelines. 

The last sentence in Slavin and Madden *s article poses a most important ques- 
tion: what must policymakers and educators do to ensure that Chapter 1 funds 
"are buying the most effective programs possible and that Chapter 1 policies are 
rewarding school practices conducive It) the success of all children"? 



The final article, by Slringficld, Billig, and Davis (Chapter 1 Program Improve- 
ment: Cause for Cautious Optimism and a Call for More Research), provides the 
results of a mullislale survey of schools targeted for program improvement. The 
authors stale: "The program improvement provisions in the Hawkins-Stafford 
Amendments to Chapter 1 reslon the optimistic premise that school-level ac- 
countability pressures directed at Chapter 1 will lead to higher academic achieve- 
ment for educationally disadvantaged students." While Ihe legislation may be 
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unrealistic in assuming lhal improvement is primarily an act of will, it does cor- 
rectly focus on Ihc school as the proper level of change. 

To determine the local responses to Ihc new Chapter 1 provisions, Stringfield and 
his colleagues studied the resulLs of a survey of principals of over 200 schools 
identified for program improvement in three slates. The researchers found that 
more than two-thirds of the responding schools had begun to implement program- 
matic changes. They report that fully 84 percent of the respondents supported the 
legislative provisions. The authors 1 resulLs suggest that when program improve- 
ment efforts are carefully implemented, they can lead to greater understanding of 
the role of Chapter 1 in schools and to belter sla IT perceptions of compensatory 
education. Stringfield, Billig, s»nd Davis conclude that more research is needed to 
study the effects of the Chapter 1 legislation, and to provide options to low-per- 
forming schools. 



The articles in this volume give the reader a belter understanding of important in- 
dividual, social, and institutional conditions that frequently contribute to students 
being at-risk of school failure, and suggest ways of correcting these conditions. 
At the same time, ihey highlight many preventive measures that could be taken to 
avoid placing children at-risk and to relentlessly ensure success in school. In- 
dividually and collectively, the 13 articles add to our knowledge of what really 
matters, what docs make a difference, and what policy decisions should be made 
so lhal children from educationally disadvantaged backgrounds can triumph in 
school. In sum, these articles not only provide us with innovative directions for 
future research, but they also stimulate our thinking about a range of new and fas- 
cinating education questions and ideas — frequently overlooked — that originate 
from basic and applied research. 



Ronald J. Pedone 
Editor 

Office of Research 
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Inner-city pupils differed from others in racial and ethnic backgrounds, 
family incomes, parents 'education and employment, and family composition. 



DEMOGRAPHIC DISPARITIES OF 
INNER-CITY EIGHTH GRADERS 



Education of inner-city children is often characterized by high 
drop-out rates and low achievement test scores. Drop-out rates are 
over 40% in some cities (Hahn, Danzberger, and Lefkowitz 1987), 
and test scores on average are below the national norm or ranked 
the lowest among children in different communities (Ornstein and 
Levine 1989; National Center for Education Statistics 1990). These 
problems persist despite tremendous improvement efforts devoted 
to schools. In fact, the problems in some cities have intensified 
(Council of Chief State School Officers 1988). 
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Understanding the nature of the problems and searching for 
solutions continue to be educational research priorities. These 
priorities are reflected in the recent establishment of the National 
Center for Education in the Inner Cities at Temple University, 
funded by the Office of Educational Research and Improvement of 
the U.S. Department of Education, 

One major source of the education problems in the inner city is 
rooted in the students' demographic and socioeconomic back- 
grounds. Previous studies have shown that what students bring to 
schools greatly determines the difference among schools (e.g., 
Coleman et al. 1966). Studies also have found that poverty, unsta 3le 
families, and other social disturbances in a community are major 
roots of the problems in urban education (Walberg and Kopan 1972; 
Passow 1977; Sinclair and Ghory 1987; Council of Chief State 
School Officers 1988; Casserley and Kober 1990), 

Unfortunately, very little systematic and comparable national 
data are available for researchers to further their understanding of 
the demographic context of inner-city children. Most information 
from the local school district is embedded in or mixed with the 
information on the larger context of urban education, which in- 
cludes middle-class and affluent neighborhoods. 

One reason for the lack of specific inner-city education informa- 
tion in the nation is the lack of a consistent definition of inner city 
and the difficulty in drawing a clear boundary of the inner city 
within urban areas. The so-called inner city implies disadvantaged 
urban communities where physical deterioration is evident and 
social disturbances, such as crime and illegal drugs, are widespread. 
These communities, however, do not have clear and consistent geo- 
graphic boundaries; they may be confined to the urban center or 
scattered throughout the city. 

This study therefore attempts to (a) develop a definition of 
inner-city children based on information readily available from 
schools and communities and (b) apply this definition to a national 
data source to develop a demographic and socioeconomic profile 
of children in the inner city. Results should further our understand- 
ing of the unique "culture" or environmental background of inner- 
city children and reveal special education needs of these children. 
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DEFINITION 



This study used two criteria to identify inner-city children. The 
first is the community location- An inner city must be within a 
Standard Metropolitan Statistical Area (SMSA), a variable gener- 
ally available in most national data bases. The second criterion is 
the community's poverty level As mentioned earlier, an inner city 
implies a community where poverty and other social problems are 
evident. The information on these aspects is hard to collect, partic- 
ularly if the information applies only to certain sections within a 
geographical area. A proxy measure for the socioeconomic condi- 
tion is a necessary and practical alternative. One such measure 
available from public schools is the percentage of students partici- 
pating in free or reduced-price lunch programs. 

In this study, urban schools were assumed to be located in com- 
munities where a substantial number of families were on welfare if 
more than 50% of the students participated in free or reduced-price 
lunch programs. These schools were considered to be in the inner- 
city, and students attending them were designated as inner-city 
children. 

For comparison, schools in suburban and rural settings were also 
classified by the percentage of students in free or reduced-price 
lunch programs. Schools in both types of communities with more 
than 50% of the students in free or reduced-price lunch programs 
were labeled as disadvantaged schools. The other schools, with less 
than 50% of the students in free or reduced-price lunch programs, 
were labeled advantaged schools. 



The preceding definition was applied to data collected by the 
National Educational Longitudinal Study of 1988 (NELS:88y 
Results of this study showed that over 25% of the eighth graders 
(833,000 students) in 1988 were enrolled in urban schools (7% in 
the inner city, and another 18% in other urban settings). The total 
eighth-grade enrollment was estimated at 3.3 million in 1988. 2 
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Assuming a similar percentage distribution of students in other 
grade levels from kindergarten to twelfth grade, there are over three 
million students in inner-city schools whose educational attainment 
will affect the well-being of this country. 

Results of this study also show that sizable numbers of eighth 
graders were enrolled in suburban and rural disadvantaged schools 
(3% and 5%, respectively). Thus a total of 15% of all eighth graders 
in 1 988, including children in the inner city, were attending schools 
in which more than 50% of students participated in free or reduced- 
price lunch programs. 



A well-known phenomenon in inner-city schools is the high 
concentration of racial/ethnic minorities. Based on NELS:88 data, 
eight of every ten inner-city eighth graders in 1988 were minorities. 3 
Overall, the largest group was African American (48%), followed 
by Hispanic (25%). Only about 20% of students in inner-city 
schools were white. Smaller percentages of Asian Americans and 
Native Americans made up the rest (see Table 1). 

This racial/ethnic distribution of students differed significantly 
from that of students in other communities. In advantaged sub- 
urban and rural schools, for example, over 80% of the eighth grad- 
ers in 1988 were white. Only in disadvantaged suburban areas were 
there similar high concentrations of minorities; the majority were 
Hispanic. 

A further examination of student distribution revealed that the 
composition of race/ethnicity in the inner city varied by geographic 
region. Although African Americans were the most dominant group 
in the north central region (59%), the South (50%), and the North- 
east (43%); Hispanics were the most dominant group in the West 
(45%). As expected, there were also more Asian Americans and 
Native Americans in the West. 

High proportions of minorities are educated in the inner city. As 
shown in Table 2, 25% of African American and 17% of Hispanic 
students were enrolled in inner-city schools, as compared to 2% of 
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white eighth graders enrolled in 1988. Thus the quality of inner- 
city education has a greater impact on minorities than on the ma- 
jority in this country. The improvement of inner-city education is a 
critical step toward raising the overall education attainment level 
of minorities. 

There is also a sizable proportion of minority students enrolled 
in disadvantaged schools in suburban and rural areas. These 
schools, together with inner-city schools, enrolled a total of 49% of 
Hispanic students, 40% of Native American students, and 36% of 
African American students. In comparison, just 7% of white stu- 
dents were enrolled in disadvantaged schools. Clearly, the quality 
of these schools affects minorities more than it affects majority 
students. 



Inner-city schools are a melting pot for students of different 
backgrounds. About one-quarter of these students were classified 
as language minorities whose dominant language at home is not 
English. About 82% of the language minorities were Hispanic and 
Asian American. As expected, the percentage figures varied by 
region. In areas where there were more Hispanics, the percentage 
of language minorities was higher. Similarly, in disadvantaged 
suburbs where there were high concentrations of Hispanics, the 
percentage of language minorities was also very high (38%). In 
disadvantaged rural areas, it was 28% (see Table 3). 

Under the Bilingual Education Act (PL. 100-297), schools are 
encouraged to implement special programs using bilingual educa- 
tional practices, techniques, and methods in order to provide equal 
educational opportunity for children with limited English profi- 
ciency and to promote educational excellence. The high concentra- 
tion of language minorities in the inner city presents unique chal- 
lenges to school systems. NELS:88 data showed that inner-city 
schools offered special language programs and have more foreign 
language courses for students to choose from than do other schools. 
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UNSTABLE FAMILIES 

Another widespread phenomenon in the inner city is the rela- 
tively high number of unstable families. NELS:88 data showed that 
less than one-half of the students in the inner city lived with both 
natural parents (44%), as compared to over 60% of their counter- 
parts in other communities (see Table 4). Avery high percentage of 
inner-city children lived with their mothers only (31 %), about twice 
the percentage of students in other communities. Fourteen percent 
of inner-city children lived with their mother and a male guardian, 
and 6% lived with other relatives or nonrelatives. 

The difference in family composition was even more pro- 
nounced among African Americans. About 42% of African Amer- 
ican children in the inner city lived with their mother only, and less 
than 30% lived with both natural parents (see Table 5). High 
percentages of Hispanic and Native American children lived with 
their mother only (25% and 29%, respectively). In contrast, Asian 
Americans showed a high percentage of children living with both 
natural parents (75%). 

NELS:88 data also showed that a large percentage of inner-city 
students' parents were unmarried (46%), including divorced, sep- 
arated, never married, widowed, and cohabiting. In contrast, un- 
married parents in other communities were less than 26% (see Table 
6). A high percentage of children reported that their parents were 
never married (12%), probably indicating that many inner-city 
children were born out of wedlock. Consistent with data shown in 
Table 4, a much higher percentage of African American students' 
parents in the inner city were unmarried (63%), followed by Native 
Americans (45%), and Hispanics (36%). 

Considerably more inner-city children than children in other 
communities live in unstable families in which children may re- 
ceive inadequate care and support for success in school (Peng and 
Lee 1991). Such home environments are undoubtedly stressful to 
children and can affect teachers and school environments as well. 
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TABLE 5 

Percentage Distribution of 
Inner-City Students by Family Composition 



Race/Ethnicity 


MandF 


MandMG 


FandFG 


Monty 


F only 


Other 


Asian 


73.0 


2.1 


0.6 


9.9 


4.9 


9.5 


African American 


29.6 


16.3 


1.4 


41 .5 


2.3 


8.9 


Hispanic 


54.3 


13.6 


1.8 


24.6 


2.5 


3.5 


Native American 


38.1 


15.8 


3.3 


28.5 


2.5 


11.8 


White 


60.8 


12.6 


3.0 


18.0 


3.7 


2.0 



NOTE: W = mother; F = father; MG = male guardian; FG = female guardian. Totals 
in each row may not equal 100.0 because of rounding. 



UNDEREDUCATED PARENTS 

Parents in inner cities and disadvantaged suburban and rural 
communities had significantly lower educational attainment than 
parents in advantaged communities. In inner cities, about 22% of 
parents did not graduate from high school as compared to 8% of 
parents in other urban communities. A similar pattern was observed 
in suburban and rural communities (see Table 7). 

A much higher percentage of Hispanic parents (41 %) and Asian 
American parents (38%) did not finish high school, compared to 
14% of African American parents and 14% of white parents who 
did not finish high school (see Table 8). A high percentage of Asian 
American parents who did not finish high school may have been 
refugees from Southeast Asia. 

DEPRESSING ECONOMIC CONDITIONS 

The majority of children in the inner city live in poverty-stricken 
conditions. The unemployment rates of inner-city parents were 
highest among the six community types. About 15% of the inner- 
city mothers and about 9% of the inner-city fathers were unem- 
ployed when the parent data were collected in the spring of 1989. 
In the inner city, an additional 6% of mothers and 7% of fathers 
were either retired or disabled (see Table 9). 
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Peng et al. / INNER-CITY EIGHTH GRADERS 
TABLE 8 

Percentage Distribution of Inner-City 
Parents' Education Level by Race/Ethnicity 



Native 



Education Level 


Asian 


Hispanic 


African 


White 


American 


Unknown 


5.4 


5.3 


3.4 


2.0 


17.2 


Did not finish high school 


37.8 


41.2 


14.3 


14.2 


17.0 


High school graduate or 












general equivalency 












diploma 


11.9 


18.0 


25.3 


30.0 


21.2 


> High school and 












< four-yesr degree 


30.3 


27.9 


48.2 


38.1 


40.4 


College graduate 


11.6 


3.9 


6.0 


10.5 


0.0 


MA or equivalent 


1.7 


2.9 


1.9 


4.5 


0.0 


Ph.D., M.D., or other 


1.2 


0.7 


0.9 


0.7 


4.2 



NOTE: Totals in each column may not equal 100.0 because of rounding. 

Family income was lowest in inner cities. As shown in Table 10, 
about 48% of students lived in families whose annual income was 
below $15,000 in 1988. In contrast, families with annual family 
incomes of less than $15,000 in other urban areas totaled 19% and 
in advantaged suburban communities only 12%. In 1988 the pov- 
erty threshold for a family of four was $12,092 and $16,149 for a 
family of six. The percentage of all persons in the country below 
the poverty threshold was about 13% (U.S. Bureau of the Census 
1989). 

Among racial/ethnic groups in the inner city, a higher percentage 
of African Americans and Hispanics had income levels below 
$15,000 (55% and 53%, respectively). In contrast, about 28% of 
white families were below this income level. 

DISCUSSION 

Children in the inner city differ from children in other commu- 
nities in many ways. First, they are predominantly minorities. This 
trend is likely to intensify, because more African Americans moved 
from the rural South into cities (Lemann 1986a, 1986b), and more 
Hispanics migrated into urban areas (Wilson 1987). Minorities also 
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have higher fertility rates (Hodgkinson 1989). Second, a much 
greater proportion of children in the inner city come from very poor, 
undereducated, or unstable families. Each of these situations rep- 
resents an educational disadvantage, and thus most of the children 
in the inner city face multiple disadvantages. To overcome these 
disadvantages is a tremendous challenge to students and educators. 
Since some communities are unable to provide the necessary re- 
sources, their education problems may be further complicated. 

This study also reveals that over one-quarter of the school- 
children in this nation were enrolled in urban schools, a significant 
portion of them in the inner city. The quality of schooling in the 
inner city and other urban areas certainly affects the educational 
opportunities of these children. This is particularly serious, because 
a large proportion of the nation's minorities are educated in inner- 
city schools. The effectiveness of these schools greatly affects the 
overall achievement of minorities. Thus the improvement of inner- 
city education is essential in the effort to increase the educational 
attainments of minorities. 

The multitude of demographic disparities in the inner city points 
to the need for instruction that considers each students' readiness, 
motivation, interest, learning skills, and other factors in the pre- 
scription of classroom materials and the choice of teaching strate- 
gies. It also points to the need for educators to understand the culture 
of students' families — their parenting practices, value systems, 
attitudes toward education, as well as the health, social, and psy- 
chological problems associated with urban conditions. Educators 
need to know more about what strengths and deficiencies students 
bring to school in order to provide them with relevant and effective 
programs. As pointed out by Knapp andTumbull (1990), schooling 
designed for children of traditional families may not fit well with 
the children from the inner city, since many of them do not have 
the kind of preparation, family support, or the mainstream "culture" 
typically required for success in school. Thus different schooling 
practices, instructional strategies, and curriculum may be required, 
including new ways to interact with parents to bring a closer 
connection among schools, families, and the community. Alterna- 
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tives to conventional wisdom, as suggested by Knapp and Turnbull 
(1990), are worth exploring. 

To be more specific, effective inner-city school programs require 
further understanding of the educational function taking place at 
home and in the community. We need to know (a) how demographic 
diversity and poverty have impeded the conventional or current 
school education, (b) how the educational needs of inner-city 
students differ from those of students with different socioeconomic 
backgrounds, (c) how the home and community environments 
affect students' aspirations and motivation, and (d) the extent to 
which parents can work with schools to bring forth the potential of 
their children. In other words, we need to know more about what 
parents do or do not do at home that affects student learning and 
more about how to engage parents and the community in education. 
Previous studies have found that parents of low socioeconomic 
status tend to be less involved in the schooling process, less com- 
municative with their children, and more likely to impose strict 
rules at home for their children without complementary support and 
assistance (e.g., Peng and Lee 1991). Many inner-city families may 
have these problems. Thus helping parents educate their children 
at home is another challenge to schools and other community 
service organizations. 

The socioeconomic disparities in inner cities also point out that 
many domestic problems that affect student learning cannot be 
resolved by schools alone. Addressing poverty and unstable fami- 
lies, for example, may require other social assistance. Collaboration 
with nonschool agencies may be needed to provide a comprehen- 
sive plan to improve education in inner cities. 



NOTES 

1. NELS.88 is a study sponsored by the National Center for Education Statistics, U.S. 
Department of Education. It involved 24,599 eighth graders from a sample of 1,035 high 
schools across the country selected to represent a total of about 39,000 schools with eighth 
graders. Within each school, approximately 26 students were randomly selected. The base- 
line data collection was completed in 1989. The response rate was 98.9% for schools and 
93.4% for students. 
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The second follow-up survey is being completed, and the third follow-up survey is 
scheduled for 1992. Details of the study design and the data content ire described in the 
study 's data file documentation (Ingels et ai. 1990). The baseline data were the bases for this 
study. 

It should be noted, however, that the study excluded mentally handicapped students, 
students not proficient in English, and students having physical or emotional problems. Thus 
estimates of students for some subgroups are underre ported. 

2. The total enrollment of the seventh graders in 1987-88 in public elementary and 
secondary schools was 2,910,432. This number plus 7.3% of total private school enrollment 
in 1987-88 adds up to about 3.3 million (sec National Center for Education Statistics 1990, 
56, 68). Those students were assumed to be eighth graders in the 1988-89 school year when 
NELS:88 was conducted. 

3. Because of the highly stratified and clustered sample design, all analyses in this study 
required the use of sample weights to obtain unbiased estimates. A sample weight is the 
universe of the probability of being selected, adjusted for nonresponses. Furthermore, the 
standard errors of statistics for this complex sample design were adjusted by a design effect 
The design effect is a measure of the impact of departures from simple random sampling on 
the precision of sample estimates. For any statistical estimator, such as a mean or a pro- 
portion, the design effect is the ratio of the exact variance of a statistic derived from the 
complex sample design to that obtained from the formula for a simple random sample of the 
same size. Design effects for subgroups vary and are generally smaller than the design effect 
for the total group. For simplicity, the mean design effect of 2.5 for all students was used in 
this analysis. Detailed descriptions of sample weights and design effect are included in the 
data file User's Manual (Ingels et al. 1990). The following formula was used to calculate 
the standard error of a percentage: SE * DEFF xa x (p(l - p)ln) xa , where p is the weighted 
percentage of respondents giving a particular response, n is the size of the sample, and DEFF 
is the mean design effect of 2.5. All group differences cited in the text arc significant at the 
.05 level. 
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Educational Levels of Adolescent 
Childbearers at First and Second 
Births 

DIANE SCOTT-JONES 

University of Illinois at Urbana-Champaign 

This study employed the 1985 birth records of a midwestern state to 
assess the educational levels of white, black, and Hispanic adolescents, 
15-19 years of age, having first and second births. There were age and 
ethnic differences in the relationship of first and second births to ed- 
ucational levels of adolescent' Marriage was differentially related to 
educational level among the three ethnic groups. The educational level 
of fathers, when reported, was significantly correlated with the adolescent 
childbearers" educational level. Implications of these findings for future 
research and for programs and policies related to adolescent pregnancy 
prevention and to educational improvement are discussed. 

Sexual activity among adolescents has increased in the past two decades 
(Centers for Disease Control 199 1). In 1988, 5 1 percent of white and 
59 percent of black adolescents were sexually active, in contrast with 
27 percent of white and 46 percent of black adolescents in 1970. 
Birthrates of adolescent females increased in 1988, after almost two 
decades of decline (National Center for Health Statistics 1990). The 
1988 adolescent birthrate, however, remains lower than the 1972 rate. 
Although the majority of American adolescents who give birth are 
white, the proportion of blacks among adolescent childbearers is almost 
double their proportion of the adolescent population. Blacks are 15 
percent of the adolescent population and 29 percent of adolescent 
childbearers (Children's Defense Fund 1988). Hispanic adolescents 
have higher birthrates than do white adolescents but lower rates than 
blacks (Children's Defense Fund 1990). 

The developmental life course of adolescents making early transitions 
into childbearing may be altered substantially. Other developmental 
transitions may affect and be affected by early childbearing. The com- 

© 1991 bv The University of Chicago. All rights reserved. 
0 1 95-6744/9 1/9904-0005501 .00 



August 1991 




K 1 



44 

FFICEo/ 

'ESEARCH // . 20 Volume 1, No. I, Summer 1993 




Adolescent Childbearers 

pletion of schooling necessary for entering the adult labor force suc- 
cessfully is a major developmental task that adolescent childbearers 
must accomplish. This article reports a descriptive study of the edu- 
cational levels of adolescents having first and second births. 

Adolescent childbearers have lower educational attainment than do 
those who delay childbearing until after adolescence (Marini 1984). 
The negative effect of adolescent childbearing on educational attainment 
is greater the younger the adolescent (Matt and Marsiglio 1985). Dif- 
ferences in educational attainment between adolescent mothers and 
those who delay childbearing remain throughout the adult life course; 
however, educational attainment is a stronger predictor of black adult 
women's income than is the experience of adolescent pregnancy (Scott- 
Jones and Turner 1990). 

Some progress in the educational attainment of adolescent mothers 
has been made. Currently, a greater proportion of adolescent mothers 
complete high school than in the past (Upchurch and McCarthy 1989). 
Black adolescent mothers are more likely than white adolescent mothers 
to complete high school (Upchurch and McCarthy 1990; McCrate 
1988). Of 19-year-olds giving birth in 1988, 65 percent of black mothers 
and 60 percent of white mothers had completed 12 or more yeaiS of 
school. Of 18-year-olds giving birth in 1988, 49 percent of black mothers 
and 43 percent of white mothers had completed 12 or more years of 
school (National Center for Health Statistics 1990). An assessment of 
the educational attainment of black adolescent mothers 17 years after 
their first pregnancy indicated that two-thirds had completed high 
school, one-third had continued formal education beyond high school, 
and 5 percent were college graduates (Furstenberg et al. 198 7 ). 

Although educational attainment is depressed by the experience of 
adolescent pregnancy, a complicating possibility is that adolescents 
who are not actively engaged in school and are not performing well 
may be more likely to become pregnant. Young adolescents may be 
moving along a developmental trajectory that leads to low educational 
expectations and low academic achievement prior to the occurrence 
of pregnancy. Both black and white, and male and female, adolescents 
who have high educational expectations are less likely to be sexually 



Diane Scott-Jones is associate professor in the Department of 
Educational Psychology and the Department of Psvchology at the 
University of Illinois at Urbana-Champaign. Her research interests 
include social development, family processes, the development of mi- 
norities, and social policy issues for children, youth, and families. 
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active than adolescents with low educational expectations (Scott-Jones 
and White 1990). Many pregnant adolescents had poor basic skills 
before the pregnancy occurred (Rindfuss et al. 1984). In the National 
Longitudinal Survey of Youth, adolescents who became pregnant after 
dropping out of high school had a greatly reduced probability of 
eventually graduating. For adolescent females enrolled in schooi, how- 
ever, childbirth was not predictive of subsequently dropping out; pa- 
rental education, two-parent family structure, reading materials in the 
home, enrollment in a college preparatory curriculum, and not smoking 
or drinking were predictive of high school graduation (Upchurch and 
McCarthy 1990). In Project Redirection, an intervention program for 
pregnant and parenting adolescents, of those who had dropped out 
of school, one-half had done so before the pregnancy occurred (Polit 
et al. 1988). 

Once a pregnancy occurs during adolescence, the adolescent may 
be diverted into a life course that emphasizes childbearing and child rear- 
ing, to the exclusion of continued schooling. The adolescent mothers 
subsequent pattern of childbearing may affect educational attainment. 
Although the difference in completed family size between adolescent 
and older childbearers has declined, women who have their first preg- 
nancy during adolescence have rr re children than do women who 
have their first pregnancv after the adolescent years (Teachman 1985). 
In 1985. 1 percent of 15- 19-year-old females had a repeat birth (Chil- 
dren's Defense Fund 1988). For a majority of participants in an in- 
tervention program focusing specifically on the prevention of a second 
pregnancy during adolescence, a second pregnancy occurred within 
two years of the first (Polit and Kahn 1986). 

Adolescent childbearers currently are less likely to be married than 
were adolescent childbearers in the past (National Center for Health 
Statistics 1990). Adolescents in general are unlikely to be married, and 
there are racial differences in marriage rates. In 1985, of all 15-19- 
year-olds. 11.2 percent of Hispanics. 7.6 percent of whites, and 1.6 
percent of blacks were married (Children's Defense Fund 1988). Almost 
two-thirds of white adolescents and almost all black adolescents having 
first births become pregnant outside marriage (Furstenberg et al. 1989). 
The role of marriage in educational outcomes for adolescent mothers 
may not be positive, however. Married black and Hispanic adolescent 
mothers are not likely to remain in school; further, their divorce rate 
is high (Children's Defense Fund 1990; McLaughlin et al. 1986). Sim- 
ilarly, black and white males who marry in adolescence have lower 
educational attainment and more marital disruption than do males 
who marry in adulthood; the relative educational disadvantage of 
having married in adolescence remains throughout the adult life course 
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(Teti et al. 1987). The presence or timing of marriage was not associated 
with educational attainment of adult black and white women who had 
their first birth in adolescence (Teti and Lamb 1989). 

Whether marriage occurs or not, adolescent mothers' continuing 
their schooling mav be influenced by the level of education their male 
partners have attained. Adolescent fathers have higher dropout rates 
than their peers who have not fathered children; dropout rates are 
higher for married adolescent fathers than for those who are unmarried 
(Marsiglio l&o/). The fathers of children born to adolescent mothers 
are not always themselves adolescents, however. In 1985, 37 percent 
of adolescent mothers who gave birth did not identify the father. Only 
\H percent of the fathers were reported to be 15- 19 vears of age; the 
remaining fathers were 20 vears or older (National Center for Health 
Statistics 1987). 

This study employs the 1985 birth records of a midwestern state, 
Illinois. In 1985, Illinois ranked fifth among the states in the nation 
in the numbers of births to adolescents; however, the birthrate of 
15- 19-vear-olds in Illinois in 1985, 49.3 births per 1,000 females, is 
not substantially different from the 1985 national rate of 51.2 births 
per 1,000 females (Children's Defense Fund 1988). 

Illinois birth records were used to answer the following questions 
about the educational levels of adolescent childbearers. (1) Does the 
educational level of adolescents experiencing a second birth differ 
significantly from those experiencing a first birth, among white, black, 
and Hispanic adolescents of different ages? If adolescent pregnancy 
results in a decline in educational level, then adolescents experiencing 
a second birth should have a lower educational level than that of first- 
time childbearers. (2) Is the educational level of adolescent childbearers, 
at the time of birth, significantly below that expected for their age? 
If low educational achievement is an antecedent of early childbeanng, 
then educational level should be below age norms ; • the time of a first 
birth, as well as at the time of a second birth. (3) Is educational level 
different for married and unmarried childbearers? Married adolescents 
were expected to have lower educational levels than unmarried ado- 
lescents. (4) Among second-time childbearers, does the outcome of 
the first pregnancy affect educational level? It was expected that second- 
time childbearers whose pregnancy resulted in live birth would have 
lower educational levels than those whose first pregnancy did not 
result in a live birth. (5) Are age and educational level of fathers related 
to mothers' educational levels? It was expected that older and less 
educated fathers would be associated with lower educational levels for 
adolescent childbearers. 
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Method 
Data Source 

Data used are 1985 birth records for the state of Illinois from the 
Illinois Department of Public Health, Vital Records. In Illinois in 1985, 
179,004 births were recorded; mothers ranged in age from 11 years 
to 48 years. Of these births, 12.7 percent were to adolescents 19 years 
of age or younger; .3 percent were to adolescents younger than 15 
years of age. For blacks, 26.8 percent of all births were to adolescents, 
for Hispanics, 14.9 percent, and for whites, 8.9 percent. 

Of first births in 1985, 26.4 percent were to adolescents 19 years of 
age or younger. For blacks, however. 54.5 percent of first births were 
to adolescents, for Hispanics, 36.2 percent, and for whites, 18.1 percent. 
For first births in 1985, the mean age of mothers was 24.5 years for 
whites, 20.2 years for blacks, and 21.8 years for Hispanics. 

In 1985, 22,667 births were to adolescents. Of these, two-thirds 
(15,276) were first births. Approximately one-fourth (5,501) were second 
births. Third or later births were 8 percent (1,890) of births to ado- 
lescents. Table 1 shows the distribution of first and second births to 
adolescents, by age and ethnic group. (Number of cases in the analyses 
reported in the Results section may differ slightly from those in table 
1 because of missing data.) Blacks have a disproportionately high 
percentage of first births (40 percent) and second births (51 percent). 
Both first and second births to adolescents increase with age within 
all ethnic groups. The age distribution of adolescent births differs 
among the three ethnic gToups, however. Younger adolescents account 
for a higner percentage of births to blacks than to the other two 
groups; younger adolescents account for a higher percentage of births 
to Hispanics than to whites. 

The marital status of adolescents at first and second births, by age 
and ethnic group, is presented in table 1. The percentage of adolescents 
who are married increases with age within each ethnic group. By the 
age of 19 years, more than half of white and Hispanic adolescents 
having first or second births are married. For blacks, however, the 
percentage married ranges from practically none at the age of 15 
years to less than 10 percent at the age of 19 years. 

Because there is no measure of socioeconomic status, race/ethnic 
origin of mothers may be confounded with socioeconomic status in 
the analyses. Further, in this data set, race/ethnic origin of adolescent 
mothers is confounded with rural-urban residence. The majority of 
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births to white adolescents, 79 percent of first births and 81 percent 
of second births, were to those residing in small cities and townships. 
The majority of births to black and Hispanic adolescents were to those 
residing in Chicago, the only major urban center in the state, with a 
population exceeding 3 million. For blacks, 70 percent of first births 
and 69 percent of second births were to adolescents residing in Chicago; 
for Hispanics, 77 percent of first births and 73 percent of second 
births were to adolescents residing in Chicago. 

Variables employed in this study included race/ethnic origin of mother, 
age of mother, age of father, educational level of mother, educational 
level of father, mother's marital status, and total number of children 
born to the mother. Most of the data are self-reported and are subject 
to the usual limitations of self-report data. Birth data, however, are 
generally considered highly reliable (Hayes 1987). These inclusive 
birth records are important, particularly for examining whether ed- 
ucational level is low prior to the occurrence of a first pregnancy. 
Because of the small number of births to adolescents younger than 
15 years, the major analyses of this study employed the birth records 
of 15- 19-year-old adolescents. 

Results 

First and Second Births 

Analyses of variance were conducted to determine whether educational 
level differed for adolescents experiencing a second birth and those 
experiencing a first birth in the three ethnic groups. Separate two 
(birth: first, second) x three (ethnic group: white, black, Hispanic) 
analyses of variance were conducted for each of the five age groups 
from 15 to 19 years. Analyses were conducted separately by age because 
the maximum possible years of schooling for adolescents to have com- 
pleted varies within this age range. Births to adolescents younger than 
15 years were not included because of the small numbers. Mean years 
of education at first and second births, by age and ethnic group, are 
presented in table 2. 

For 15-year-olds, there were no significant main effects of first or 
second birth and no significant interaction. There was a significant 
main effect of ethnic group (F (2, 1,254) = 14.62, P< .0001). Tukey's 
studentized range (HSD) test indicated that Hispanics had significantly 
lower educational levels than did blacks and whites. 



August 1991 



5i 

// - 26 Volume /, No. /, Summer 1993 




Of 

K 



FFICEo/ 
ESEARCH 



g w I CM O 30 00 CO 
C *— oc oc ad 06 



0 

^ 0 



£ : .!= .1= 



•X> — rr r-; CT> 
X CT. O O ^ 



ID 



s 

a 



-3 



a. 

• 2 



< 

CD 



^ c 



1 c-S 

o t! 



r» in co — 
x c> o — cm 



o O") r- 

cS o — — 



— o 30 »fi ^ 
o> o o — — 



oo r-» 0"> 30 if5 

oc oi 0 — 



o <o O 30 — 
ji C C - 



J 2 
=: c 

Q. 



_u c 
a. 5 

Is . 

vr, - W 
« 30 C 

pi 

u 1/1 

5 c 2 

; -5.2 £ 
: g *js 

• c y ^ 

, « « 



— . o> *r 30 



5=2 g 

g s 3- 
"Six 
E7. u 



s 



t5 



.5 "-5 



G 
< 



lo to r— £?> 



« i w 

— _C 

1 -s i 

- ^ 7 
u il 

C S 

2 ? u 



Volume 1, No. 1, Summer 1993 




Scott-Jones 

For each of the remaining age groups, there were significant first 
and second pregnancv x ethnic group interactions; for 16-vear-olds, 
F (2, 2,631) = 11.9,.P < .0001; for 17-vear-olds, F(2, 4,083) = 15.4, 
P < .0001; for 18-vear-olds, F (2, 5,575) = 22.3, P < .0001; and for 
19-year-olds, F (2, 6,536) = 8.43, P < .0002. 

Post hoc pairwise comparisons of means were conducted. All effects 
reported were significant at P < .0001. For 16-year-olds, Hispanics 
had significantly lower educational levels than did whites or blacks; 
adolescents having second births had significantly lower educational 
levels than did those having a first birth among Hispanics only. For 
the remaining age groups, Hispanics having a first birth had significantly 
lower educational levels than did whites or blacks having first or second 
births; numbers of Hispanics having second births were significantly 
lower than Hispanics having first births. 

For both 17- and 18-year-olds, blacks having a first or second birth 
and whites having a first birth were not significantly different from 
one another and had significantly higher educational levels than whites 
having a second birth. For 19-year-olds, blacks and whites having a 
first birth were not significantly different from one another and had 
significantlv higher educational levels than did the remaining groups. 
Blacks having a second birth had significantly higher educational levels 
than did whites having a second birth. 

Comparisons to Xatwnal Median Educational Levels 

The vears of education of adolescents having a first birth and those 
having a second birth were compared to national norms for educational 
attainment for white and black female adolescents in 1985 from the 
L'.S. Bureau of the Census (1987). For Hispanics, data were not available 
for females only; therefore, the educational attainment of Hispanic 
adolescent childbearers was compared to norms Tor Hispanic females 
and males combined. Males are more likelv than females to have been 
retained in a grade (Dryfoos 1990); therefore, the comparisons for 
Hispanic childbearers may underestimate their difference from Hispanic 
females in general. 

To test whether the educational level of adolescents having a first 
birth and those having a second birth differed significantly from the 
national median for the age-ethnic group, difference scores were cal- 
culated. The educational level of the adolescent childbearer was sub- 
tracted from the national median for her age-ethnic group. Separate 
Mests were conducted to determine whether the difference scores were 
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Difference between Median National Educational Levels and Educational 
Levels of Adolescent Mothers 





WHITE 


BLACK 


HISPANIC 




First 


Second 


First 


Second 


First 


Second 


AGE 


Birth 


Birth 


Birth 


Birth 


Birth 


Birth 


15 


NS 


NS 


.26*** 


.22** 


NS 


NS 


16 


-.21*** 


-.53*** 




.19*** 


-.57*** 


-1.67*** 


17 


-.41*** 


-1.05*** 




-.24*** 


-1.02*** 


-1.71*** 


18 


-.72*** 


-1.33*** 


-.35*** 


-.55*** 


-1.57*** 


-2.54*** 


19 


-.80*** 


-1.48*** 


-.56*** 


-.93*** 


-2.20*** 


-2.79*** 



** P < .01. 
***P < .001. 



significantly different from 0. The mean difference scores are presented 
in table 3. 

For 15-year-old adolescents having first and second births, educational 
level was not significantly lower than the national median; for black 
adolescents, educational level was significandy higher than the national 
median, although the magnitude of the difference was small. For 16- 
year-old black adolescents having first and second births, educational 
level was significandy higher than the national median, although, again, 
the magnitude of the difference was small. For both first-time and 
second-time childbearers, in all other age-ethnic groups, educational 
level was significantly lower than the national median. 

Marital Status 



Mean years of education for married and unmarried adolescents at 
first and second births, by age and ethnic group, are presented in 
table 4. Births to adolescents younger than 1 8 years were not included 
because of the small numbers of married adolescents, especially among 
blacks. To determine whether educational level differed for married 
and unmarried adolescents, a three (ethnic group) x two (marital 
status) x two (birth: first or second) x two (age: 18 or 19 years) analysis 
of variance was conducted. Significant interactions were found for 
marital status and ethnic group (F (2, 12,113) = 96.9, P < .0001), 
marital status and first and second births (F (1, 12,1 14) = 29.1, P < 
.0001), and ethnic group and first and second births (F (2, 12,113) = 
15.68, P < .0001). 
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Post hoc pairwise comparisons of means were conducted. All sig- 
nificant effects reported below were significant at P < .0001. For 18- 
and 19-year-olds having first and second births, married Hispanic 
adolescents had significantly lower educational levels than all other 
groups; unmarried Hispanics were significantly lower than black and 
white married and unmarried adolescents. Among Hispanics, marriage 
was associated with the loss of approximately one year of education. 
For first births at 18 and at 19 years, married blacks had significantly 
higher educational levels than did all other groups; unmarried blacks 
and unmarried and married whites were not significantly different 
from one another. For second births at 18 and at 19 years, married 
and unmarried blacks did not differ significantly from each other and 
had significantly higher educational levels than did married and un- 
married whites, who were not different from each other. 

First-Birth Outcomes and Second Births 

To determine whether the educational level of adolescents having a 
second birth varied according to the outcome of the first birth, com- 
parisons were made between those whose first pregnancy resulted in 
a live birth and those whose first pregnancy did not. Adoption is rarely 
chosen by adolescent childbearers (Dryfoos 1990); therefore, one can 
assume that adoption was rarely the outcome for adolescents whose 
first pregnancy resulted in a live birth. Adolescents having a second 
birth were omitted from this analysis if their first pregnancy resulted 
in a live birth but the child was not living at the time of the second 
birth; 1.4 percent of the second births were omitted for this reason. 

Among white adolescents having a second birth, the first pregnancy 
did not result in a live birth for 67 percent of 15-year-olds, 58 percent 
of 16-year-olds, 46 percent of 17-year-olds, 37 percent of 18-year-olds, 
and 32 percent of 19-year-olds. Among blacks, the percentages were 
32 percent for 15- and 16-year-olds, 26 percent for 17-year-olds, 24 
percent for 18-year-olds, and 22 percent for 19-year-olds. For Hispanics, 
the percentages were 18 percent for 15-year-olds, 27 percent for 16- 
year-olds, 17 percent for 17-year-olds, 23 percent for 18-year-olds, 
and 19 percent for 19-year-olds. 

Separate two (outcome of first pregnancv) x three (ethnic group) 
analyses of variance were conducted for each of the five age groups 
from 15 through 19 years. For 15-year-olds, there were no significant 
effects; however, the sample size was small (n = 119). For 16-year- 
olds, there was a main effect of ethnic group (F (2, 428) = 41.4, P < 
.0001). Tukey's studentized range (HSD) test indicated that Hispanics 
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were significantly lower than blacks and whites; the latter two groups 
were not significantly different from one another. 

Significant first-pregnancy outcome x ethnic group interactions oc- 
curred in the separate ANOVAs for 17-year-olds (F (2, 931) = 8.0 t 
P < .0004), 18-year-olds (F (2, 1,647) = 5.49, P < .004), and 19-year- 
olds (F (2, 2,243) = 2.9, P < .05). Pairwise comparisons of means 
following the separate ANOVAs for 17-, 18-, and 19-year-olds indicated 
that, for both whites and Hispanics, those whose first pregnancy resulted 
in a live birth had significantly lower educational levels than those 
whose first pregnancy did not result in a live birth; educational level 
was not significantly different for blacks whose first pregnancy resulted 
in a live birth and blacks whose first pregnancy did not result in a live 
birth. The educational level of blacks was significantly higher than 
that of every other group except for whites whose first pregnancy did 
not result in a live birth. 

Educational Levels of Fathers 

Of all births in 1985, 12.7 percent were to females 19 years of age or 
younger, but only 4.8 percent of fathers of all births were 19 years of 
age or younger. Of the adolescent fathers, 74 percent were 18 or 19 
years of age. Table 5 presents the mean age and educational level of 
fathers of births to adolescent females, by mother's age and mothers 
ethnicity. Father's age and educational level were not reported by some 
adolescent mothers. The percentage of 15-19-year-old mothers in 
the three ethnic groups reporting father's age ranged from 79 percent 
to 85 percent; the percentage reporting father's educational level ranged 
from 54 percent to 73 percent. 

The reported ages of fathers ranged from 12 years to 64 years. At 
each age level, the mean age of fathers reported for Hispanic adolescents 
was approximately one year greater than father's age reported for 
white adolescents, which was slightly higher than that reported for 
black adolescents. At each age level, the mean age of fathers was 
approximately four years greater than mother's age for Hispanics and 
approximately three years greater for whites and blacks. Father's ed- 
ucational level reported for Hispanics is lower than that reported for 
whites and blacks, and the difference is greater for older than for 
younger adolescents. 

Pearson product-moment correlations indicated that father's edu- 
cational level is positively correlated with mother's educational level 
within the three ethnic groups for all age groups combined (for whites, 
r = .36, P < .0001; for blacks, r = .39, P < .0001; for Hispanics, r = 
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.57, P < .0001). The relationship of fathers age to mother's educational 
level varies for the three ethnic groups (for whites, r = 0; for blacks, 
r = .17, P < .0001; for Hispanics, r = -.15. P < .0001). 



Discussion 



This stud\ found age and ethnic differences in the relationship of first 
and second births to educational levels of adolescents. Marriage was 
differentials related to educational level among the three ethnic groups. 
The educational level of fathers, when reported, was significantly cor- 
related with the adolescent childbearer's educational level. These findings 
have implications for future research and for programs and policies 
related to adolescent pregnancy prevention and to educational im- 
provement. 

Young adolescent childbearers were not as far behind the national 
median for their age -ethnic group as were older adolescent childbearers. 
An educational trajectory different from that of their age-ethnic group 
was not evident for 15-vear-olds having a first or second birth in any 
ethnic group. Further, at 15 and 16 years of age, black adolescents 
having first and second births showed a small but statistically significant 
increase over the national medians. These findings suggest that in- 
terventions to maintain adolescents' engagement in school need to be 
instituted in early adolescence, when, even with a first or second birth, 
adolescents are not behind national medians for educational attainment 
for their age. The requirement of compulsory school attendance until 
age 16 years mav play a role in young adolescent childbearers' remaining 
in school. Additional policies and programs, such as child-care programs 
requiring the adolescent mother to remain in school, may boost the 
educational attainment of older adolescent childbearers. 

The finding that the voungest adolescent childbearers were not 
below national medians in educational attainment also suggests that 
voung adolescents generally have trouble educationally, for reasons 
other than the occurrence of a first or second birth. The educational 
level of voung adolescents, particularlv blacks and Hispanics, is cause 
for concern, without the occurrence of pregnancy. In general, black 
and Hispanic students have a higher probability of being retained in 
grade than do white students (Drvfoos 1990). Of eighth graders in 
the National Education Longitudinal Study, 18 percent reported having 
repeated at least one grade. In that studv, approximately 26 percent 
of black and 23 percent of Hispanic eighth graders reported having 
repeated at least one grade, in contrast to approximately 16 percent 



August 1991 



„ €2 

a (jFFICEo/ 

ERJC V |^ESE\RCH 11-34 Volume 1, No. 1, Summer 1993 



AdoU scent Childbearers 



of whites (Hafneret al. 1990). Preventing educational failure is needed 
for young adolescent childbearers and for young adolescents generally. 

Difference from the national medians became more pronounced 
for older adolescent childbearers. Adolescents having a first child at 
the age of 19 years could have completed high school prior to the 
pregnancy, if they had progressed through school on schedule. In all 
three ethnic groups, however, the average educational level for 19- 
year-olds having a first birth was less than 12 years. Hispanic adolescents 
having a first birth at the age of 19 vears were more than two vears 
behind the national median for 19-vear-old Hispanics. These adolescents 
had completed, on average, less than 10 years of schooling. These 
findings stronglv suggest that educational difficulties occurred prior 
to the experience of pregnanes 

The difference in educational level between adolescents having a 
first birth and those having a second birth varied with age and ethnic 
group. Among 15-year-olds and. with the exception of Hispanics, 16- 
year-olds, adolescents having a second birth did not have significantly 
lower educational levels than those having a first birth. For 17- and 
18-year-olds, second births were associated with lower educational 
levels relative to first births for Hispanics and whites. For 19-vear- 
olds, second births were associated with lower educational levels relative 
to first births for Hispanics. whites, and blacks. Black adolescents, who 
accounted for more than one-half of second births, were least affected 
educationally bv second births. 

The sharp ethnic differences in this studv are consistent with existing 
literature. Hispanic childbearers had significantlv lower educational 
levels than blacks or whites, within each age level of 15- l9-\ ear-olds. 
Black adolescent childbearers deviated less from the national median 
educational level for their age, ethnic, and gender group than did 
whites or Hispanics. One possible explanation involves the differential 
rates of childbearing among blacks, whites, and Hispanics nationally. 
A higher proportion of black adolescents become childbearers than 
do whites or Hispanics; therefore, a higher proportion of childbearers 
would be included in the calculation of the national median educational 
level for blacks than for whites or Hispanics. This could not account 
entirely for the findings for blacks, however. Nationally, a larger pro- 
portion of older black adolescents become c hildbearers than do younger 
black adolescents; yet older black adolescents in this studv differ more 
from national medians for their age groups than do the vounger black 
adolescents. 

Another possible explanation for adolescent mothers' educational 
progress is the availability of special programs. Both black and Hispanic 
adolescent mothers, however, were predominantly residents of Chicago. 
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a large urban area where one would expect to find manv services for 
adolescent childbearers. Vet, the black and Hispanic childbearers were 
different in educational attainment. Another possibility is that available 
programs mav be targeted toward blacks more than toward other 
ethnic groups; alternatively, programs mav come to be identified in- 
formallv bv community members as "black" programs and may not 
be used extensivelv bv other groups who need services. Another possible 
explanation is that black adolescent childbearers receive more support 
in a variety of wavs in families and communities. 

Whatever the reason for black adolescent childbearers' seeming re- 
silience, their educational level is even more striking when one considers 
the overall educational difficulties of black and Hispanic students. The 
educationa I level of the black adolescent childbearers is not satisfactory, 
however. The mean educational level for 19-year-old first-time black 
childbearers was less than graduation from high school, which is un- 
satisfactory given that even graduation from high school is not likely 
to improve substantially the economic conditions of adult women (Moore 
and Wertheimer 1984). 

Among Hispanics and whites, the percentage of childbearers who 
were married increased with age; approximately one-half of 18- and 
19-vear-old childbearers in these two ethnic groups were married. 
Although married childbearers also increased with age for blacks, 
marriage occurred infrequently for older black adolescent child- 
bearers and hardly occurred at all for younger black adolescent child- 
bearers. For 18- and 19-\ear-olds. marriage was associated with lower 
educational levels for Hispanics, was associated with higher educational 
levels for blacks having first births, and was not associated with edu- 
cational le\el for blacks having second births or for whites. Thus, 
although blacks were much less likelv to be married than were whites 
or Hispanics. marriage was associated with increased schooling tor 
black adolescents at first birth. 

The role of mar riage in the developmental life course of adolescent 
childbearers is especialU problematic. American society places a high 
\alue on advanced education and on marriage. Adolescent childbearers 
mav not be able to combine school attendance and marriage successfully. 
Black adolescent childbearers appear to manage schooling relatively 
more successfully than the\ do marriage; Hispanic adolescent child- 
bearers are more likel\ than blacks to marrv but not as likelv to manage 
school well. 

The outcomes tor Hispanic childbearers. with lower educational 
levels at lust and second births and further depression of educational 
levels with marriage, should be studied further. Research should focus 
on specific subgroups labeled Hispanic. Data from the National Center 
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for Health Statistics indicate that, in 1985, births to Hispanic adolescents 
nationally were not evenly distributed across various subgroups of 
Hispanics. Of births to Hispanic adolescents, 69 percent were to Mex- 
icans, 12 percent were to Puerto Ricans, and 1 percent to Cubans 
(Children's Defense Fund 1988). 

A relatively high percentage of adolescent childbearers provided 
information about the father. Fathers of births to Hispanics tended 
to be older and less educated than did fathers of births to blacks or 
whites. As expected, the majority of fathers were older than 19 years; 
the average age of fathers of births to blacks and whites was three 
years greater than mother's age and, to Hispanics, four years. The 
educational level of fathers, when it was reported, showed a surprisingly 
strong correlation with the educational level of the mothers. Because 
these fathers are older, it may be difficult to reach them directly with 
typical school programs. School-based programs will thus miss a major 
actor in the scenarios that lead to unplanned early pregnancy. Further, 
the marriage prospects of adolescent childbearers may be tied to the 
status of males in early adulthood. Bowman (1990) has found that a 
high proportion of black young adult males are jobless and experience 
a phenomenon called "job-search discouragement," which affects young 
males' capacity to provide material and emotional support for a family. 
More research is needed on the male partners involved in adolescent 
pregnancy. As Parke and Neville (1987) point out, we know little about 
the patterns of involvement between male partners and adolescent 
mothers and little about differences between adolescent and older 
male partners. 

These results suggest the importance of studying the changing pat- 
terns of childbearing in white, Hispanic, and black adolescents' lives. 
Changing patterns of childbearing should be studied in relation to 
other transitions adolescents make, especially regarding the completion 
of schooling and marriage. Programs to increase educational achieve- 
ment and programs to prevent unplanned pregnancies indirectly by 
focusing on educational and other life options need careful evaluations. 
Evaluations should identify the effective components of such programs 
and determine whether such programs are differentially used by or 
are differentially successful with various ethnic and age groups. Programs 
that focus on life options, however, have not yet produced rigorous 
evaluations (Hofferth 1987), and programs that focus generally on 
improving educational outcomes have not assessed adolescent pregnancy 
as part of their evaluations (Hayes 1987). Intervening in early ado- 
lescence, when childbearing is relatively infrequent, will probably be 
maximally effective in preventing unplanned pregnancies and in fos- 
tering the educational progress of adolescents who become pregnant. 
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Note 



The source oi data is Illinois Department of Public Health, Vital Records, 
1980-88 sterilized birth tapes; the data analyses and interpretations, however, 
are the authors and do not reflect the official position of the Illinois Department 
of Public Health. The work reported herein was supported under the Educational 
Research and Development Center Program (agreement no. R117Q00031) 
as administered by the Office of Educational Research and Improvement, 
U.S. Department of Education, in cooperation with the U.S. Department of 
Health and Human Services, The findings and opinions expressed in this 
article do not reflect the position or policies of the Office of Educational 
Research and Improvement, the U.S. Department of Education, or the U.S. 
Depart ment of Health and Human Services. I acknowledge the assistance of 
Marv /wover Anderson in data analvses. The author thanks Joyce Epstein, 
Anne ( '.. Peteison. and three anonymous reviewers for their helpful comments 
on [hi* article. 
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Explaining Within-Semester Changes in Student Effort 
in Junior High School and Senior High School Courses 
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Within any course, as a semester progresses some students reduce their effort and others try 
harder. Virtually every cognitive theory of motivation suggests that changes in ability perceptions 
partially determine these changes in effort. Researchers have also cited changes in students' 
valuing of the course and changes in extrinsic pressures as determinants of effort changes. 
Covanance - tructure modeling was used to test 4 alternative models concerning the determinants 
of effort in a sample of 167 junior high school and 155 senior high school students. Models 
specifying a direct effect of ability-perception change on effort change fit the data better than d ; d 
models specifying only indirect effects or no effect of ability perceptions on effort. Ability- 
perception changes also directly affected students' valuing of the subject matter. The results 
emphasize the importance of helping students develop confidence in their abilities. 



Within almost any course, as a semester progresses some 
students reduce their effort and others try' harder. What factors 
are responsible for within-semester changes in student effort? 
In this study, we test theoretical models concerning the deter- 
minants of student effort during junior high school and senior 
high school. 

During the past two decades, ability perceptions have come 
to play a central role in many theories of human motivation 
and action (e.g.. Bandura. 1977; Covington & Beery, 1976; 
Dweck, 1986; Kukla, 1978; Meyer, 1987; Nicholts, 1984; 
Raynor & Brown. 1985). For example, according to self- 
elTicacy theory, individuals with self-percepts of low ability 
arc easily discouraged by failure to attain the standards they 
set for themselves, whereas those who are confident of their 
ability typically intensify their efforts when failure occurs and 
persist until they succeed (Bandura & Cervonc. 1983). Self- 
worth theorists (Covington & Beery, 1976) have claimed that 
students who lose confidence in their ability may adopt coun- 
terproductive, effort-avoidant strategies so that failure, if it 
occurs, can be blamed on insufficient effort rather than on 
low competence. According to attribution theory, attributing 
failure to low ability is among the causes of learned helpless- 
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ness and depression ( Abramson, Seligman, & Teasdale, 1978; 
Weiner, 1986). 

The emergence of self-perceptions of ability as a cornerstone 
of theories dealing with achievement-related behavior is not 
surprising given the importance of the expectancy construct 
in the pioneering theoretical work of Atkinson ( 1957). Atkin- 
son defined expectancy for success at a task in terms of 
perceptions of the probability of success. He and his colleagues 
(e.g., Atkinson & Birch, 1978) have typically operationalized 
subjective expectancy for success as "perceived task facility" 
rather than as "perceived self-concept of ability" (Reuman, 
1986). That is. Atkinson emphasized the role of easy tasks 
(rather than the role of high-ability perceptions) in producing 
high expectancies. Consequently, the most frequently adopted 
strategy for manipulating expectancies in classic experimental 
research on persistence and choice (e.g., Feather, i960 has 
been to supply subjects with information about the normative 
difficulty of the task they arc attempting. Recent reformula- 
tors of Atkinson's achievement motivation theory (e.g., Ray- 
nor & Brown, 1985; Reuman. 1986) have pointed out that 
high-ability perceptions lead to higher expectancies for success 
than do low-ability perceptions. Recent formulations, there- 
fore—though not disregarding the role of task facility in 
influencing expectancies — have emphasized the role of per- 
sonal ability perceptions. 

The positive role of ability perceptions in influencing effort, 
especially in the face of difficulty, has been confirmed in 
several empirical studies. For example. Helmke(l987) found 
that students' math-ability perceptions at the end of fifth 
grade had a positive impact on the quality of students' later 
efforts (e.g., on their perseverance and on their active engage- 
ment during instruction in sixth grade). Brown and Inouye's 
(1978) data indicated that the higher students' expectancies 
were concerning their ability to solve anagrams, the longer 
they persisted on anagrams for which they were unable to 
find solutions. Likewise, Hallerman and Meyer (1978, cited 
in Meyer, 1987) found that perceived ability was strongly 
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predictive of the persistence of teenage students on insoluble 
achievement tasks, regardless of whether the tasks were por- 
trayed as normatively e?sy or normatively difficult. Students 
who perceived their ability for the achievement task as high 
exhibited high persistence at both "easy" and "difficult" tasks. 
Furthermore, there is evidence that attributing oneVlearning 
difficulties to insufficient ability leads to decreased persistence 
(Andrews & Debus. 1978: Dienei & Dweck, 1978; Licht, 
Kistner, Ozkaragoz. Shapiro. & Clausen. 1985: Weiner, 1 979). 
For instance, Licht et al. ( 1985) found tha:. for both learning- 
disabled and non-learning-disabled children, the tendency to 
attribute one's failures to insufficient ability was negatively 
related to persistence on a reading task. 

Although ability perceptions, especially as they determine 
one's expectancy for success on an achievement task, have 
been central in theories of achievement motivaron, the value 
of a task to the indivi dual is also assumed to influence his or 
her effort on that task. In Atkinson's ( 1 957) theory, task value 
is narrowly defined in terms of the incentive value of suc- 
cess — the amou. .t of pride one expects t/> experience if one 
succeeds. Difficult tasks are assumed to have higher incentive 
value than easy tasks. More recently. Parsons and Goff ( 1980), 
Reuman (1986). and others (e.g., Feather, 1988) have sug- 
gested that there are other reasons for valuing an achievement 
activity in addition to the pride one feels if one succeeds. 
According to Reuman (1986), these include "the inherent, 
immediate enjoyment one gets from developing, mastering 
or using a skill involved in the activity" (p. 92), that is. the 
intrinsic or interest value of the activity, at. J "the importance 
of the activity for some future goal" (p. 93), that is, the utility 
value of the activity. 

Different relationships between expectancies and task val- 
ues might be predicted depending on how one defines these 
two constructs. In the classic experimental research, expectan- 
cies for success were defined in terms of the inverse of the 
normative difficulty of the task, and onlv the incentive value 
of the task was considered. Expectancies for success (P s ) and 
the incentive value of success (I t ) were assumed to be perfectly 
inversely related: that is. P, = I - I v In contrast, we propose 
that when expectancies for success are defined in terms of 
ability perceptions and the value of a task is defined in terms 
of its intrinsic and utility value, the relationship between 
expectancies and values will be strongly positive. 

Our proposal is similar to Ryan. Connell, and Deri's (1985) 
proposition that "any event that enhances perceived compe- 
tence will tend to enhance intrinsic motivation, while those 
that facilitate the perception of incompetence will diminish 
intrinsic motivation" (p. 17). In support of this proposition. 
Ryan ct al. (1985) cited studies indicating that students who 
arc provided with positive performance feedback concerning 
their competence on a task display higher levels of intrinsic 
motivation for the task than do students who don't receive 
performance feedback (Harackiewicz. 1979: Ryan, Chandler. 
Connell. & Deci. 1983). Furthermore. Vallerand and Reid 
(1982) reported evidence from a path analysis that suggests a 
causal link between feelings of competence in an activity and 
intrinsic motivation for that activity. In another study involv- 
ing causal modeling techniques, Harter and Connell (1984) 
found that the structural equation models that best fit their 



data specify that pupils who evaluate their academic compe- 
tence positively are more likcly"than others to be intrinsically 
motivated to engage in academic tasks. 

There are at least two reasons to expect that ability percep- 
tions also have a moderate positive effect on students' percep- 
tions of the utility value of a course. First, students who 
believe that they are unable to master the knowledge and 
skills taught in a course may understandably question the 
course's usefulness to them (e.g., "If I can't master it, how 
will it help me in the future?"). Second, students (end to select 
career goals that require those talents that they think they 
have rather than those talents that they think they don't have. 
As a result, those activities that are perceived by students as 
useful in helping them reach their long-range goals also tend 
to be those activities at which they feel at least moderately 
talented. 

In addition to the determinants of effort already mentioned, 
theories of achievement motivation emphasize that extrinsic 
pressures for achievement also influence effort (Atkinson. 
1964; Ryan et al., 1985). For example, one might study hard 
in an attempt to please one's parents, even if one's ability 
perceptions or value perceptions in a course are low. There- 
fore, one issue we examined in this study is whether increases 
in the perceived importance of extrinsic pressures lead to 
increases in student effort. 

In summary, effort on school tasks is assumed to be affected 
by ability perceptions, task-value perceptions, and perceptions 
of extrirsic pressures. In addition, higher ability perceptions 
are expected to lead to higher task-value perceptions. In our 
study, we tested these hypotheses. We also examined the 
relative importance of these factors in influencing effort and 
the nature (e.g., direct vs. indirect) of their effects. 

In the analyses that follow, we evaluate four alternative 
models concerning the determinants of student effort in a 
course duringjunior high school and senior high school. Each 
model incorporates specific hypotheses concerning the causal 
relations among the following five correlated factors: (a) 
Change in Effort, (b) Change in Self-Concept of Ability, (c) 
Change in the Intrinsic Value of the Subject Matter* (d) 
Change in the Utility Value of the Course, and (e) Change in 
the Importance of Extrinsic Pressure for Achievement. Each 
model assumes that change in the importance of extrinsic 
pressures is a direct cause of change in effort. The models 
differ with regard to the roles in influencing effort that are 
attributed to changes in self-concept of ability, intrinsic value, 
and utility value. For example, one model assumes that the 
causal linkage between self-concept of ability and effort is a 
direct one. We compare this model with a model that assumes 
that the causal link between self-concept of ability and effort 
is mediated entirely through intrinsic value and utility value. 
A third model posits both direct and indirect causal links. 
Finally, we compare each of these models with a model that 
denies self-concept of ability a causal role in influencing effort; 
in this alternative model, change in self-concept of ability is 
viewed as a consequence rather than as a cause of changes in 
intrinsic value, utility value, and effort. 

We included (and analyzed separately) data from both 
junior high school and *enior high school students because 
the role of particular factors may differ in the middle grades 
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and the high school grades. Utility value was expected to be 
more strongly associated with effort for high school students 
(who will confront occupational choices and endeavors sooner 
than will junior high school students), and the importance of 
extrinsic pressures was expected to be more strongly associated 
with effort for junior high school students (who arc more 
influenced by a desire to please parents than arc high school 
students). 

Method 

Students from two junior high schools and two senior high schools 
in southern California were recruited for this study. Each school had 
a diverse siudent body that included students from a broad spectrum 
of social classes and ethnic groups. Initially 23 teachers volunteered 
for the study, but 3 changed their minds and dropped out before the 
study was completed. Of the teachers who completed the study, 40% 
were math teachers. 20% were English teachers. 15^ were science 
teachers, and the remaining 25% taught social studies or elective 
subjects (e.g., Spanish, computer education, photography). 

Participating teachers distributed parental permission forms to the 
students in one or two of their classes. Students who returned signed 
permission forms were allowed to participate in the study. The 
number of participating students varied considerably from classroom 
to classroom because some teachsrs issued daily reminders to hring 
back the permission slips before **q u est ion n aire day." whereas others 
issued few or no reminders. The overall student participation rate 
was about 60%. Of the participating students. 46% *crc White. 23% 
were Hispanic. 13% were Black, and 18% were from other ethnic 
groups: 46 r S of the students who participated were boys and 54% 
were girls. 

To obtain multiple indicators of change, several measures of each 
construct were included on a survey questionnaire that was admin- 
istered to students twice: once within the first 2 weeks of the semester 
and once at the end of the semester. A total of 322 students (167 
junior high school students and 155 senior high school students) filled 
out both the beginning-of-semester and the end-of-scmester question- 
naires. Teachers were asked to complete an assessment of each 
participating student's effort both at the beginning and at the end of 
the semester. These effort ratings by teachers were obtained for 282 
of the 322 students. 

The Appendix lists the items that were used to measure the five 
constructs assessed. With the exception of two effort items, all items 
have a response scale ranging from 1 to 7 with various anchors, as 
indicated in the Appendix. Each item focuses on the subject area of 
the specific course in which students were given the questionnaire. 
To maximize the construct validity of the change in effort factor, we 
combined information from two independent sources (student self- 
report items and teachers' ratings) in measuring effort changes (sec 
Nunnally. 1978. p. 9b). For each item in each factor, the difference 
between a student's or teacher's rating at the end of ihe semester and 
at the beginning of the semester was used as an estimate of change. 
These change estimates were the basic data used in the analyses in 
this article.' 

Resuits 

Overview of Analysts Strategy 

We used confirmatory factor anai> s to assess the adequacy 
of the proposed fi\e-factor measurement model. Then we 
conducted LISREL analyses to test the adequacy of several 



alternative coviiriancc structure models (models specifying 
not only the factor structure but also the causal relations 
among factors). In both types of analyses, we evaluated the 
adequacy of hypothesized models by examining the congru- 
ence between the covaria.ice matrix generated by the hypoth- 
esized model and the observed covanance matrix. We used 
the Tuckcr-Lewis index (TLI) to assess whether the overall fit 
(between the covarianccs generated by the hypothesized 
model and the observed covariances) was good enough to 
support the model (Tucker & Lewis, 1973). The TLI is the 
only widely used good.iess-of-fit index that is relatively inde- 
pendent of sample size (Marsh. Balla. & McDonald. 1988). 
Although there is not universal agreement on what constitutes 
"goocT fit. a value of ,90 or better on the TLI is usually- 
considered acceptable (Marsh et aL 1988. p. 393). A TLI of 
.90 indicates that the proposed model imp roves the null model 
by 90% of the amount one would expect from a model that 
is precisely true. 

Confirmatory Factor Analysis 

In the confirmatory factor analysis, each indicator of change 
listed in the Appendix was constrained to load only on the 
factor that it was designed to measure. The results of the 
analysis indicated that the hypothesized factor structure fits 
the data well ix'ldf ratio = 1.30. TLI = .93). Furthermore, 
the factor loadings (given in the Appendix) and factor vari- 
ances were large and statistically significant. (We adopted a 
.05 probability level for all significance tests reported in this 
article.) As anticipated, all correlations among factors (see 
Table 1 ) were positive and. except for the correlation between 
change in the importance of extrinsic pressures and change 
in self-concept of ability, significant. 

These means and standard deviations of the five factors 
were estimated with Bollen's (1989. pp. 306-311) method 
(See Table 2). These estimates reveal that, on average, there 
are negative within-scmestcr changes in effort, in the perceived 
importance of extrinsic pressures, in self-concept of ability, 
and in intrinsic value. On the other hand, the average within- 
scmestcr change in students' perceptions of the utility value 
of their coursework is slightly positive. Finally, the standard 
deviations in Table 2 indicate that there is considerable vari- 
ation among students in the within-semcstcr changes that 
they exhibit. 

Covoriance Structure Models 

The confirmatorv factor analysis supported the hypothesis 
that the within-semcstcr change scores computed from stu- 



1 Even though observed difference scores provide unbiased esti- 
mates of true change, many authors have criticized the use of these 
scores (e.g., Bcrciicr. 1963; Bohrnstcdt. 1969; Kcssler. 1977. Unn k 
Slindc. 1977: O'Connor. 1972). However, as Willett ( 1988. p. 367) 
concluded, recent methodological research has revealed that the 
purported deficiencies of c'lflcrcncc scores "arc perceived raihcrthan 
actual. tmagmun rather than real (Rogosa. Brandt. & Zimowski. 
1982: Rogosa & Willett. 1983. 1985: sec also Zimmerman. Brotohu- 
sodo. & Williams. 1981 : Zimmerman & Williams. 19821." 
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Table 1 



Factor Correlations 



Factor 


I 


-> 


3 


4 5 


1. AINT 










2. AEXT 


.24 








3. AUTI 


.32 








4. ASCA 


.73 


.15 


.34 




5. AEFF 


.52 


.36 


.31 


.58 - 



Xote. AINT = Change in ihe Intrinsic Value of the Subject Matter. 
AEXT = Change in the Importance of Extrinsic Pressures for 
Achievement: AUTI = Change in the Utility Value of the Course; 
ASCA = Change in Self-Concept of Ability; AEFF = Change in 
Effort. 



dents" responses to the items in the Appendix measure the 
five correlated factors that they were intended to measure. 
Therefore, in the covanance structure analyses, we used the 
hypothesized five-factor structure as the measurement model. 
In these analyses, each factor was measured by its four best 
indicators. The metric of each factor was set (with the refer- 
ence indicators listed in Table 2) to measure within-semester 
change on a 13-point scale: the maximum possible positive 
change was +6 and the maximum possible negative change 
was -6. : 

As described earlier, four alternative causal models were 
tested. In the following sections, we describe the results of 
analyses conducted to test the fit of each model to the empir- 
ical data. 

Model 1. The causal relations specified in Model 1 (de- 
picted in Figure 1) reflect our hypothesis that when students 
lower or raise their estimate of ability in a subject, their effort 
in the subject is afTected, as is their valuing of the subject (e.g., 
"If I discover Tm good at a subject. I'm more willing to put 
forth effort in the subject, and I'm more likely to perceive the 
subject to be interesting and useful"). In addition. Model 1 
specifies that changes in the importance of extrinsic pressures 
(e.g., an increased desire to please parents or to obtain a good 
grade) affect effort. 

Unstandardized maximum-likelihood parameter estimates 
for Model I were obtained separately for junior high school 
students and senior high school students with simultaneous 
multisample analysis in LISREL VI. These estimates are 
reported in Figure i. 3 For each path, the estimate for junior 
high school students is listed first (to the left of the slash). 
Model 1 explains 70^ of the variance in change in effort 
levels in junior high school students and 34% of this variance 
in senior high school students. In both junior high school and 
senior high school, a within-semester change in students' self- 
concept of ability has a substantial and statistically significant 
impact on their effort in that course. For example, for junior 
high school students, an increase in self-concept of ability of 
1 point is associated with an increase in effort of 0.7 points. 
Change in the importance of extrinsic pressures has a signifi- 
cant impact on change in effort in junior high school but not 
in senior high school. In both junior high school and senior 
high school, change in self-concept of ability is positively 
associated with change in intrinsic value and with change in 
utility value. Change in self-concept of ability explains over 
50% of the variance in change in intrinsic value and over 
\0% of the variance in change in utility value in both samples. 



Finally, Model I fits the data well (\ : /df ratio = 1.15, TLI « 
.94). 

Model 2. Model 2 (depicted in Figure 2) differs from 
Model 1 in that it specifies that some of the effect of change 
in self-concept of ability on change in effort is indirect. That 
is. in addition to directly affecting effort, change in ability 
perceptions indirectly affects effort by causing changes in the 
perceived intrinsic value and utility value of the subject mat- 
ter. 

As in Model I. the parameter estimates in Model 2 are 
consistent with the following assertions: (a) within-semester- 
changes in students' course-related ability perceptions affect 
both students' effort and their valuing of the subject matter, 
and (b) in junior high school, change in the perceived impor- 
tance of parents and grades as extrinsic motivators leads to 
changes in effort. In addition, there are several indications 
from the output associated with Model 2 that the direct effect 
of change in ability perceptions on change in effort may be 
more important than its indirect effects. First, Model 2 (which 
contains the indirect effects) does not fit the data significantly 
better than Model 1. A\ : = 3.73. Sdf- 4. ns. Second, the 
parameter estimates from Model 2 suggest that the direct 
effect of change tn self-concept of ability on change in effort 
is larger than the sum of the indirect effects: the estimated 
direct effect is 0.56 in junior high school and 0.41 in senior 
high school, whereas the sum of the indirect effects is only 
0.14 in junior high school and 0.16 in senior high school. 

It should be noted, however, that the high correlation 
between change in sell-concept of ability and change in in- 
trinsic value (.73) makes it difficult to estimate precisely the 
relative magnitudes of the direct effect and the indirect effects. 
Because of this high correlation, the parameter estimates for 
the direct effect and one of the indirect effects are highly 
correlated (-.80). One result of the correlation between these 
two parameter estimates is that the standard error associated 
with the effect of change in self-concept of ability on change 
in effort is 1.7 times higher in Model 2 than it is in Model I. 
For this reason, in Model 2 the significance level associated 
with this effect does not reach conventional levels of signifi- 
cance. 

Model 3 Another way of testing the importance of the 
direct effect of change in self-concept of ability on change in 
effort ts to compare the fit of a model that contains the direct 



: When intcnndividual differences in true change arc small, the 
difference between two observed measures tends to be less reliable 
than either mdmdual measure. In regression analyses and other 
traditional statistical techniques, unreliable measures increase bias in 
parameter estimates involving those measures as predictors and in- 
flate standard errors of estimate. However, if change in each construct 
is measured by several difference stores, covanance structure mod- 
eling techniques can be used fas we use them here! to obtain param- 
eter estimates that arc unbiased b* the random measurement error 
in the difference scores. 

1 There arc three hpes of parameter estimates reported in Figures 
1-4. each represented b\ a different type of line. A curved line with 
arrowheads at both ends represents the covanance between exogenous 
factors. The direct effect of one (actor on another is represented by a 
strai^i line (or by connected line segments) wuh one arrowhead to 
show the assumed diieUion ui causation Finally, an arrowhead 
without a tail represents specific jtion unvr < error in equations or 
omitted variables). 
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Table 2 

Reference indicators for Five Change- Score Factors and Estimated Means and Standard Deviations tor These Factors 



Factor 



Wording of reference indicator 



Maximum possible change 



U change 



AlNT How excited are you to learn 

about this subject matter? 

AEXT Is doing as well as your par- 

ents expect you to do in this 
class important to you? 

AUTI I am taking this class because 

it helps prepare me for a 
job. 

ASCA How good are you in this 

subject? 

AEFF How hard are you working 10 

learn about this subject? 



+6 (from nui at all excited 
to verv excited) 

+6 (from not at all impor- 
tant to me to Yen im- 
portant to me) 

+6 (from not an impor- 
tant reason at ail to a 
verv important reason) 

+6 (from not %ood ai all 
to very good) 

+6 (from not hard ai all 
to as hard as I can) 



-<J 17 
-0.43 

+0.07 

-0.22 
-0.55 



1.12 
1.12 

1.47 

0.91 
0.91 



Vote AlNT = Change in the Intrinsic Value of the Subject Matter. AEXT = Change in the Importance of'Extnnsic Pressures for Achievement: 
AUTI = Change in the Utility Value of the Course: ASCA = Change in Self-Concept of Ability: aEFF = Change in Effort. 



effect (e.g.. Model 2) with a nested model that eliminates the 
direct effect (Model 3 in Figure 3). The fit of Model 3 is 
significantly worse than the fit of Model 2: ^x : = 7 -18, idf 
= 2, p < .05. Thus, the evidence suggests the existence of the 
direct effect. 

In conclusion, the results from the first three models sug- 
gests that (a) change in ability perceptions has an important 
direct effect on change in etfort: (b) change in the perceived 
importance of extrinsic pressures for achievement has a sig- 
nificant positive impact on effort in junior high school but 
not in senior high school; and (c) as students' ability percep- 
tions in a subject increase, not only do they try harder but 
they also enjo\ the subject more and perceive the subject as 
more useful. 
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Figure I Unstandardizcd parameter estimates for Model I. (ASCA 
= Change in Self-Concept of Ability: AlNT = Change in the Intrinsic 
Value of the Subject Matter: A^FF - Change in Effort: AEXT = 
Change in the Importance of Extnnsic Pressures for Achievement: 
AUTf = Change in the Utility Value of the Course. Values to the left 
of the slash are for junior high school students, and values to the nght 
of the slash are for senior high school students. There arc three types 
of parameter estimates, each represented by a different tvpe of line. 
A curved line with arrowheads at both ends represents the covanancc 
between exogenous factors. The direct elfect of one factor on another 
is represented by a straight line [or by connected line segments) with 
on; arrowhead to show the assumed direction of causation. Finally, 
an arrowhead without a tail represents specification error [error in 
equations or omitted variables]. An asterisk indicates a coefficient 
that is greater than or equal to 1 °6 times its standard error. Tucker- 
Lewis index = 94.) 



Models 1 . 2. and 3 assume that the effect of change in self- 
concept of ability on change in effort does not depend on the 
level of students' ability perceptions at the beginning of the 
course. This assumption was confirmed with multiple regres- 
sion analyses. In these analyses, students were first categorized 
into quartiles on the basis of their ability perceptions at the 
beginning of the semester: low perceived ability (an average 
rating of 4 or less), moderate perceived ability (an average 
rating greater than 4 but less than or equal to 5), high 
perceived ability (an average rating greater than 5 but less 
than or equal to 5.75). and very high perceived ability (an 
average rating greater than 5.75). Then we estimated the 
simple regression of change in effort on change in self-concept 
of abilit> separately for each quartile and tested whether the 
regression coefficients in the separate regression equations 
were significantly difTerent from each other (i.e.. whether or 
not the regression lines were parallel). These tests revealed 
that the positive impact of increased ability perceptions on 
effort is not significantly difTerent for students who begin the 
year with perceptions of very high, high, moderate, or low 
ability: in the junior high school sample. F(3. 142) = 1.60, 
p = .19. MS, = .69. and in the senior high school sample, 
F(3, 136)= !.46,p = .23, A/£ = .73. 

Model 4. A Supplementary Analysis to Test an Alternative 
Explanation The first three models all assume that changes 
in self-concept of ability lead to changes in effort. An alter- 
native view is that ability perception change is a consequence 
rather than a cause of effort change. Perhaps, for example, (a) 
a student discovers that a subject is more interesting or more 
enjoyable than he or she originally thought, (b) the student's 
increased enjoyment prompts him or her to work harder, and 
(c) the student's hard work increases his or her competence 
and consequently his or her ability perceptions. This alterna- 
tive causal sequence is specified in Model 4 (Figure 4). Model 
4 fits the data less well than do Models 1 , 2, and 3; for Model 
4. x : /df ratio = 1.24, TLl « 91." 



* At the suggestion of an anonymous reviewer, we also tested the 
explanatory pow er and overall fit of several state-dependence models 
.hat specify that students' beginning-of-scmestcr perceptions (of in- 
trinsic saluc. utility value, ability, and extnnsic pressures) determine 
subsequent ctTort changes. These models had substantially poorer 
explanatory power and poorer overall fit than the models reported in 
this article. 
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Figure 2. b'n standardized parameter estimates for Model 2. (ASCA 
= Change in Self-Concept of Ability; AIN'T = Change in the Intrinsic 
Value of the Subject Matter: AEFF = Change in Effort: AEXT = 
Change in the Importance of Extrinsic Pressures for Achievement: 
AUTI = Change in the Utility Value of the Course. Values to (he left 
of the slash are for junior high school students, and values to the right 
of the slash are for senior high school students. There are three types 
of parameter estimates, each represented by a different type of line. 
A curved line with arrowheads at both ends represents the covanance 
between exogenous factors. The direct effort of one factor on another 
is represented by a straight line [or by connected line segments] with 
one arrowhead to show the assumed direction of causation. Finally, 
an arrowhead without a tail represents specification error [error in 
equations or omitted \ariables]. An asterisk indicates a coefficient 
that is greater than or equal to 1 .96 times us standard error. A dagger 
indicates a coelTicient that is greater than or equal to 1.65 times its 
standard error. Tucker-Lewis index - 94.) 



Mean Differences in IVi thin-Semester Change Among 
Courses of Different Types 

As reported earlier, averaging across the different types of 
courses in our sample, we found that students showed negative 
within-semester changes in effort, in the perceived importance 
of extrinsic pressure, in self-concept of ability, and in the 
intrinsic value of the subject matter. Although our study was 
not designed to allow definitive statements about differences 
among courses of different types, the data in Table 3 indicate 
that — at least in our sample — negative within-semester 
changes tend to be more prevalent and more pronounced in 
math courses than in other types of courses. Mean change in 
intrinsic value, change in self-concept of ability, and change 
in effort scores of students in math courses were significantly 
lower than those of students in every other course type. 
Whereas students in math courses did not differ from students 
in science, social studies, or elective courses in their increas- 
ingly negative perceptions of the course's utility value, stu- 
dents in English courses came to view English as significantly 
more useful as the semester progressed. There were no signif- 
icant course-type differences on change in the importance of 
extrinsic pressures. F(3, 303) = 1.33, p = .26. MS, = 0.89; 
students in all courses showed negative changes on this factor. 

Although there were significant mean differences among 
course types in change in intrinsic value, change in utility 
value, change in self-concept of ability, and change in effort, 
this docs not imply that effort changes are related differently 
to task value or to ability perception changes in courses of 
different types. On the contrary, regression analyses reveal 



that course type docs not atfect the relations of change in 
ctfort with change in intrinsic value, change in utility value, 
change in self-concept of ability, or change in the importance 
of extrinsic pressures. For example, we estimated the simple 
regression of effort change on ability-perception change sep- 
arately for math, English, science, and other courses. Then 
we tested for significant differences in the estimated regression 
coefficients. These tests revealed that the positive impact of 
increased ability perceptions on effort is not significantly 
different for students in different courses. / r (3. 246) = 0.87; p 
= .46, MS, =0.55. 

Discussion 

As mentioned earlier, in virtually every cognitive theory of 
motivation, ability perceptions are assumed to atfect student 
effort and thus to have practical, educational importance. In 
an implicit endorsement of these theories, many of the task 
forces and commissions attempting to reform the schools 
attended by young adolescents have emphasized the impor- 
tance of providing young adolescents with experiences and 
interactions that help them develop a self-image of intellectual 
competence (e.g., Carnegie Task Force on the Education of 
Young Adolescents. 1989; Children's Defense Fund. 1988; 
Maryland Task Force on the Middle Learning Years, 1989). 
The results of this study suggest that this emphasis on percep- 
tions of ability may be worthwhile. Models 1 and 2 (which 
explicitly assume that within-semester change in ability per- 
ceptions is a direct determinant of within-semester change in 
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Figure 3 Unstandaroizcd parameter estimates for Model 3. (ASCA 
= Change in Self-Concept of Ability: AINT = Change in the Intrinsic 
Value of the Subject Matter. AEFF = Change in Effort; AEXT = 
Change in the Importance of Extrinsic Pressures for Achievement: 
AUTI = Change in the Utility Value of the Course. Values to the left 
of the slash are forjunior high school students, and values to the nghi 
of the slash are for senior high school students. There are three types 
of parameter estimates, each represented by a different type of line. 
A curved line with arrowheads at both ends represents the covanance 
between exogenous factors. The direct effect of one factor on another 
is represented by a straight line [or by connected line segments] with 
one arrowhead to show the assumed direction of causation. Finally, 
an arrowhead without a tail represents specification error (error in 
equations or omitted variables]. An asterisk indicates a coefficient 
that is greater than or equal to 1 .96 times its standard error. Tucker- 
Lewis index =» .93.) 
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Future 4 Cn standardized parameter estimates for Model 4. (AEXT 
= Change in the Importance of Extrinsic Pressure tor Achievement: 
.11 NT = Change in the Intnnsic Value ot the Subject Matter. AEFF 
= Change in Effort: ASCA = Change in Self-Concept of Ability. 
AUTI = Change in the L'tilits Value of the Course, Values to the left 
of the slash arc for junior high school students, and values to the right 
of the slash are for senior high school students. There are three types 
of parameter estimates, each represented by a different type of line. 
A cursed line with arrowheads at both ends represents the covanance 
between exogenous factors. The direct effect of one factor on another 
is represented by a straight line [or by connected line segments! Wlth 
one arrowhead to show the assumed direction of causation. Finally 
an arrowhead without a tail represents specification error [error in 
equations or omitted variables). An asterisk indicates a coefficient 
that is greater than or equal to 1,96 times its standard error. Tucker- 
Lcwis index = .91.) 

effort) fit the data better than do the alternative models 
considered here. In other words, the results are consistent 
with the claim that, by reducing the number of students who 
believe that they arc "'not good" in a subject, teachers can 
increase the number of students who work near thvir poten- 
tial. 

The findings suggest that increasing students' perceptions 
of ability will achieve another important goal: for students to 
value the subject the\ arc learning. Students whose ability 
perceptions in a subject increase find the subject to be more 
interesting and more useful: conversely, students devalue 
subjects that they do not believe the\ have the ability and 
skills to master. The effect of ability perceptions on values 
may have long-term implications. For example, greater valu- 
ing of a subiect may result in students seeking further learning 
opportunities in that subject area. 



The results of this study support Eccles & Wigfield's ( 1985a) 
contention that expectancies (as measured by self-concept of 
ability) and task, values are positively related in naturally 
occurring achievement settings. Students come to value those 
subjects at which they believe they can succeed. This relation- 
ship is the opposite of the relationship between expectancies 
and incentive value proposed by Atkinson (1964). Cleariy, 
the distinction between the incentive value of a task (defined 
narrowly in terms of anticipated pride accompanying success) 
and task values (more broadly construed and applied to 
natural achievement contexts) has important theoretical and 
practical implications. 

Previous research has found that students' beliefs concern- 
ing the value of academic subjects influence their course 
enrollment decisions (e.g.. Chipman. Brush. & Wilson. 1985: 
Eccles. Adler. & Meece, 1984). The results of this study 
suggest, however, that once a student is enrolled in a course, 
changes in value perceptions may not affect effort. Model 1 
(which assumes that effort change and changes in the valuing 
of a subject are correlated only because of their common 
dependence on changes in ability perceptions) fits the data 
quite well. Furthermore, in Model 2. the estimated effects of 
intrinsic value and utility value changes on effort were small 
and insignificant. Only Models 3 and 4— which omit the 
direct etTect of abilit> perception change on effort change and 
which fit worse than Models 1 and 2— yield any significant 
effects of value -of-subj eel changes on effort changes. A useful 
goal of future research would be to delineate the circumstances 
under which students* perceptions of the subjective value of 
a task play an important role in determining effort, after 
controlling for ability perceptions. 

For many junior high school students, the desire to please 
one's parents by getting a good grade is an important reason 
for putting forth effort in a class. Increases in the perceived 
importance of this type of extrinsic pressure for achievement 
w ere significantly associated with increased effort among jun- 
ior high school students but not among senior high school 
students. This developmental difference probably reflects the 
declining importance of parental norms and pressures in 
influencing students' achievement behavior from early to late 
adolescence (e.g.. Montemayor. 1986). It is possible that by 
high school peer-related extrinsic pressures supersede parental 
pressures. Of course, peer-based pressures for or against 
achievement may be important even during the upper-ele- 
mentary and middle grades (e.g.. Slavin. 1986). The relative 
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effects of percci\ed peer and parent pressure on effort would 
be useful to examine in future studies. 

It is noteworthy that the best model tested in this study 
explained twice as much of the variance in effort for the junior 
high school students (70^) as for the senior high school 
students (34 r c). Apparently the four psychological factors 
assessed in the :tudy give a more complete picture of the 
determinants of effort on school tasks for the younger adoles- 
cents. Further research needs to examine additional factors 
that may influence older students' effort on school tasks. We 
suspect that competing activities, such as athletic training, 
jobs, and peer relationships, affect time spent on schoolwork 
for senior high school students more than for junior high 
school students. 

Educators arc searching continually for "promising prac- 
tices" to improve students' motivation and performance. By 
adding 10 our know ledge of the relative importance of factors 
that affect effort and of the relations among these factors, the 
results of this study ma> help teachers select strategies to 
increase student effort. The results suggest that the most 
fruitful approach to increasing student effort may involve 
altering curriculum and instruction, task structures, grouping 
policies, and evaluation practices to reduce the proportion of 
students who perceive themselves as having little academic 
ability (Mac Iver. 1988: Mac Iver & Epstein. 1990). However, 
raising students* confidence in their abilities is a complicated 
and difficult task. Seemingly positive teacher behaviors that 
are motivated by a desire to protect the self-concept of low 
achievers often inadvertently play a role in damaging this self- 
concept. For example. Graham & Barker ( 1990. p. 7) caution 
that "praise for success at easy tasks, the absence of blame for 
failure at such tasks, and affective displays of sympathy or 
compassion can communicate to the recipients of this feed- 
back that they are low in ability (Barker & Graham. 1987: 
Graham. 1984: Meyer et al.. 1979: see also W'cincr. Graham. 
Taylor. & Meyer. 1983)." Increased perceptions of compe- 
tence cannot be effectively achieved by setting unchallenging 
standards for success. On this point, we agree with Atkinson 
that success easily achieved engenders little pndc (and also 
fails to increase self-confidence). 

One promising approach to raising students' ability percep- 
tions is to alter classroom evaluation and recognition practices 
so that success is defined in terms of individually referenced 
(and personally challenging) improvement goals. Mac Iver 
( 1 990) described a program for middle-grade classrooms that 
follows this approach. The program helps students to set 
individualized, short-range improvement goals that are chal- 
lenging but doable. As young adolescents observe their prog- 
ress in obtaining these goals. man\ who have reached the 
premature conclusion that thes will "never be good at school- 
work" may develop a renewed confidence in their academic 
ability and a renewed enthusiasm for learning. A multiyear 
evaluation study of this program is currently under wa>. 

In addition to building students* self-confidence in their 
ability, should teachers also stress the utility value of mastering 
course content and strive to make school tasks intrinsically 
interesting? These arc undoubtedly good teaching strategics. 
Attempts to make the content of a course more clearly useful 
to students and to make assignments in a class more interest- 



ing may increase students* enjoyment of a course and positi\c 
attitudes toward school in general, and they may have long- 
term effects on the degree to which students seek further 
learning opportunities in that subject area. However, this 
study suggests that strategies directed toward ensuring that all 
students develop faith in their ability may have a greater effect 
on student's effort and attitudes at least in the short run (e.g.. 
during the course of a semester), than would strategies directed 
toward increasing the utility value and intrinsic interest of a 
course. As Eccles and Wigfield (1985b) have argued. 

One of ihe most important motivational questions facing a 
student is "'Can 1 succeed at this task if I choose to try?** .... If 
ihe answer is yes. then a student will, at least, move on to the 
next question — "Do I want to?" If the answer is no. then the 
student will, in all likelihood, give up. (p. 188) 

Classroom practices that increase the number of children 
who gain confidence in their ability may help create a success- 
prone cycle in individual children. That is. increases in effort 
(resulting from increases in perceived competence) may lead 
students to succeed more frequently. This increased success 
may prompt further increases in confidence and effort, thus 
creating a success-prone cycle. On the other hand, despite the 
importance of ability perceptions in motivating effort, height- 
ened ability perceptions will be of little use "unless accom- 
panied by the strategic knowledge that is essential to direct 
the energy to appropriate ends" (Nickerson. 1988. p. 26: see 
also Borkowski. Carr, Rcllinger. & Pressley. in press). Thus, 
one critical component of effective confidence-building pro- 
grams may be the provision of direct instruction in meiacog- 
nitivc strategies. 

One limitation of this study is its inability to identify and 
analyze differences among different socioeconomic and ethnic 
groups in the determinants of effort changes. We did not 
collect information on the socioeconomic status of individual 
students, and we have insufficient numbers of students within 
each minority group to permit a LISREL analysis of ethnic 
group differences. Simple regressions conducted within each 
ethnic group separately indicated that ability perception 
change is strongly related to effort change within every ethnic 
group. Nevertheless, future studies in which socioeconomic 
status measures are collected and in which larger minority 
group samples are evaluated may find that our best model 
does not satisfactorily fit the data from certain groups. 
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Appendix 

Questionnaire Items Administered at the Beginning and the End of a Semester to Measure 
Within-Semester Change in Five Factors 



Item 


Response scale anchors 


Loading 




Change in the Intrinsic Value of the Subject Matter 






How excited are you to learn about 


Not at all excited: very excited 


1.06 




this subject matter? 




1.14 




How much do you enjoy learning 


Sot much at all: very much 




about this subject? 


Don 't care at all\ care very much 


0.85 




How much do you care about learning 




a lot about this subject? 


Not a all: very much 


0.90 




How much do you like working on 




the assignments in this class? 




0.57 




Do you do things for fun outside of 


Never, yes. a lot 




class that are related to or have 








something to do with what you are 








'learning about in this class? 









Change in the Importance of Extrinsic Pressures for Achievement 



When I work in this class, it is because 
I want to please my parents. 

Is doing as well as your parents expect 
>ou to do in this class important to 

>ou? 

When 1 work in this class, it is because 

I want a good grade. 
How important is it to your parcni(s) 

that you get a good grade in this 

class? 



Not at all a reason: a very important 
reason 

Not at all important to me: very im- 
portant to me 

Not at all a reason: a very important 
reason 

Not at all important: very important 



Change 

I am taking this class because it helps 

prepare me for a job. 
When 1 work in this class, it is because 

the knowledge and skills are useful 

in my life and/or for my future. 
I am taking this class because I may 

need to know about this subject in 

the future. 
I am taking this class because it helps 

me do things I want to be able to 

do. 

How useful is what you learn in this 
class for a job you might want? 

I am taking this class because it helps 
me decide what career or job I 
want. 

How- useful will what you learn in this 
class be for future classes you might 
lake? 



in the Utility Value of the Course 

Not an important reason at all: a very 

important reason 
Not at all a reason: a very important 
reason 

Not an important reason at all: a very 
important reason 

Not an important reason at all: a very 
important reason 

Not at all useful: very useful 

Not an important reason at all: a verv 
important reason 

Not at all useful: very useful 



079 
1.19 

0.54 
0.54 



1.46 
1.22 

1.33 

0.96 

1.02 
1.14 

0.72 
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WITHIN-SEMESTER CHANGES IN EFFORT 

Appendix (continued) 



Item 



Response scale anchors 



Loading 



Change in Self-Concept of Ability 



How good are you in this subject? 

How good do you think you are in 
this subject compared to other stu- 
dents in the class? 

How often do you feel smart in this 
class? 

How much natural ability do you 
have in this subject? 



Not good at all: very good 
Much worse than other students', 
much better than other students 

Never \ very often 

No ability at all\ a lot of ability 



Change in Effort 

/ am not trying at all (0%); / am 
working to my highest potential 
(100%) 



Not hard at all: as hard as I can 

Just enough to pass', whatever it takes 
to get a good grade 

Much less than most classes', much 
more than most classes 

Student is not trying at all (0%); stu- 
dent is working hard enough to ful- 
fill his or her highest potential 
(100%) 



0.87 
0.88 



0.68 
0.79 



11.82 



0.96 



0.61 



0.72 



4.30 



If a student works to his or her highest 

potential in a class, then we could 

say that he or she is putting forth 

100% effort to Icarn the subject 

matter. How much effort do you 

usually put forth in this class? 
How hard are you working to learn 

about this subject? 
How hard do you study for tests in 

this class? 
How hard do you work in this class? 

If a student works to his or her highest 
potential in a class, then we could 
say that he or she is putting forth 
100% effort to learn the subject 
matter. Please estimate how much 
effort each student listed below is 
putting forth in this class, [From 

teacher's questionnaire! 

Sole. All items are from the student's questionnaire, with the exception of the final item. Response scales for each item range from 1 to 7. 
with the exception of the first and last items in the Change in Effort factor, for which responses were rated on an 1 1-ooint scale, ranging from 
0% to 100%. Factor loadings are derived from the standardized solution, in which the factors (but not the measurtd variables) have been 
standardized. Thus, each loading indicates the expected change in the raw score of a measured variable given an mere ise of 1 standard score 
in the factor. The raw difference scores for the first and last item in the Change in Effort factor have a possible rang; of-100 to +100: the 
possible range for every other measured variable is -6 to +6. 
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Preventing early reading failure 
with one-to-one tutoring: 
A review of five programs 



Tutoring is the oldest form of instruction. Parents have 
always provided one-to-one instruction to their chil- 
dren, and learning settings from driving instruction to 
on-the-job training typically employ one teacher for each 
learner for at least part of the learner's instruction. 

In elementary and secondary instruction, one-to- 
one tutoring exists around the margins of group instruc- 
tion. For example, teachers often work with individual 
children during seat work periods, recess, study hall, or 
after school. Parents often hire tutors to work with their 
children. Tutoring is often used in special education, and 
sometimes in other remedial programs such as compen- 
satory education. 

The topic of tutoring has come to the fore in recent 
years because of a renewed focus on students who are 
at risk of school failure, coupled with a renewed com- 
mitment to see that all students learn basic skills in the 
early grades. In particular, modest effects of traditional 
U.S. Chapter 1/Title I pullout programs (Carter, 1984) 
and the loosening of restrictions on uses of Chapter 1 
funds have contributed to a broader range of services 
being provided under Chapter 1 funding. 

One-to-one tutoring is one option often being con- 
sidered or implemented. In recent years, increased flexi- 
bility in Chapter 1 and other factors have led to the use 
of tutors with first graders to prevent early reading fail- 
ure. Advocates of tutoring programs argue that first 
grade is a critical year for the learning of reading, and 



reading success in the early grades is an essential basis 
for success in the later grades. Clay (1979), for example, 
argues that early intervention for children who have 
problems learning to read is crucial to children's later 
success. For students who do not learn to read in tradi- 
tional classrooms or with traditional reading programs, 
one-to-one tutoring is a possible solution to preventing 
early reading failure. 

Research on Chapter 1 programs suggests that 
remediation of learning problems after the primary 
grades is largely ineffective (see Kennedy, Birman, & 
Demaline, 1986). It may be that it is easier to prevent 
learning problems in the first place than to attempt to 
remediate them in the later grades. Considering how 
much progress the average reader makes in reading 
between the first and last days of first grade, it is easy to 
see how students who fail to learn to read during first 
grade are far behind their peers and will have difficulty 
catching up. 

The major drawback to tutoring is its cost. 
Providing tutoring to large numbers of students across 
the grade span would, of course, be prohibitive. But if in 
fact early intervention can prevent children from experi- 
encing failure and can help them get off to a successful 
start in school, the use of this expensive intervention 
may be cost effective in the long run. 

The importance of understanding the effects of 
first-grade tutoring goes far beyond the pedagogical and 
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technical issues involved. Edmonds's (1981) statement 
that every child can learn and Bloom's (1981) assertion 
to the same effect contributed to a variety of discussions 
among policy makers about learning as an "entitlement" 
for all children, on the basis that if every child can learn, 
then schools have an ethical and perhaps legal responsi- 
bility to see that every child does learn. One manifesta- 
tion of this point of view is a document produced by the 
Council of Chief State School Officers (1987) that 
describes model state statutes to entitle every U.S. child 
not only to an appropriate education but to success in 
achieving an acceptable level of performance (also see 
Council of Chief State School Officers, 1989). If success 
is seen as an entitlement, educators must have methods 
that produce success for all nonretarded children regard- 
less of home background, no matter how expensive 
these methods may be. In any discussion along these 
lines, one-to-one tutoring for at-risk students is sure to 
be one element of the strategy to ensure success for all. 

Recently, there is an unprecedented willingness 
among educators to adopt expensive early intervention 
programs if they are believed to reliably produce large 
effects. Examples of this include Project STAR in Ten- 
nessee (Word et aL, 1990) and Project Prime Time in 
Indiana (Fan, Quilling, Bessel, & Johnson, 1987), which 
have implemented substantially reduced class sizes in 
die early elementary grades. Growing provision of 
preschool and extended day kindergarten programs and 
of IBM's Writing to Read computer program are other 
examples. Recently, many districts have adopted Reading 
Recovery and Success for All, intensive reading pro- 
grams with tutoring, as means of preventing early school 
failure. 

It would/be important to know the effectiveness of 
such programs that are expensive to implement and 
maintain in school districts. If school districts plan to 
allocate Chapter 1 funds to expensive programs, the 
effectiveness of these programs should be of great con- 
cern. It is important to know how large the effect of 
tutoring is (in comparison to plausible alternatives), to 
what degree effects of tutoring are maintained over time, 
and which specific tutoring programs and practices pro- 
duce the largest gains in student reading achievement. 
The purpose of this article is to review the research on 
the effectiveness of one-to-one tutoring programs to 
identify what is currently known about the answers to 
these and other questions. 

Previous reviews of research on tutoring have pri- 
marily focused on peer tutoring (e.g., Devin-Sheehan, 
Feldman, & Allen, 1976; Scruggs & Richter, 1985). The 
one review that included tutoring by adults primarily 
focused on applications in special education (Polloway, 
Cronin, & Patton, 1986). None of these earlier reviews 



discussed any of the first-grade reading prevention mod- 
els emphasized here. 

In the present article, we consider the effectiveness 
of tutorial programs from two perspectives: empirical 
and pragmatic. From the empirical perspective, one can 
ask questions such as "Does the program work?" and 
"How strong are its effects?" To answer these questions, 
we computed effect sizes for each of the five programs. 
(This is discussed in detail in the section on review 
methods.) From a pragmatic standpoint, one can ask 
questions such as "What components of reading are 
included'" and "Does it matter if the tutors are certified 
teachers or paraprofessionals?" and "Why are some pro- 
grams apparently more effective than others?" 

It would also be important to examine the theoreti- 
cal similarities and differences of these programs regard- 
ing tiie approach taken to learning in general, and read- 
ing in specific, and how the relationship between the 
tutor/student dyad facilitates learning. One aspect of 
effectiveness of tutorial programs could be explained by 
appealing to domain-general theories such as Vygotsky 
(1978)' that have been formulated to account for the 
transmission of knowledge in one-to-one dyads. How- 
ever, while the Vygotskian perspective has been ex- 
plored with one program, Reading Recovery (see Clay & 
Cazden, 1990), theories to account for transmission of 
knowledge from tutor to student have not been explored 
in the other programs. Similarly, it would be important 
to examine the different theories of reading as espoused 
by advocates of each tutoring program. However, again 
with the exception of Reading Recovery, the programs 
do not articulate a theory of reading. 

In what follows, we review five tutoring programs. 
In the course of describing these programs, we discuss 
the mode! of reading to which each program subscribes 
to and identify the key components of reading found in 
each program. From reviewing the curriculum of the 
tutoring programs, we have identified eight components 
of the reading process that are emphasized in these pro- 
grams: perceptual analysis of print, knowledge of print 
conventions, decoding, oral language proficiency, prior 
knowledge, lexical access, syntactic analysis of sen- 
tences, and prose comprehension. We acknowledge that 
this is by no means a complete list, since key aspects of 
reading such as phonemic awareness are not included. 
However, these components were extracted from the 
programs reviewed. We then discuss which components 
each program includes. We also consider the nature of 
the tutors and how the programs are implemented. Then 
we provide effect sizes to qualify the empirical effects of 
the programs. If one tutoring program appears to be 
more effective than another, it could be because (a) 
practical differences in the program lead to different out- 
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comes, (e.g., certified teachers are used in one and not 
the other), or (b) tutors in one are using more effective 
methods or curricula than those in the other, or (c) dif- 
ferent programs to emphasize to different degrees or 
reading components that are considered to be central in 
contemporary theories of reading. In our discussion, we 
consider these and other explanations. 

Review methods 

This review uses a set of procedures called best- 
evidence synthesis, which combines elements of meta- 
analysis with those of traditional narrative reviews 
(Slavin, 1986). Briefly, a best-evidence synthesis requires 
locating all research jn a given topic and discussing the 
substantive and methodological issues in the research as 
in a narrative review. A prior criteria for germaneness to 
the topic at hand and for methodological adequacy are 
typically applied. Whenever possible, study outcomes 
are characterized in terms of effect size CES), the differ- 
ence between experimental and control means divided 
by the control group standard deviation. When means or 
standard deviations are not reported, effect sizes are esti- 
mated from F, /, or other statistics (see Glass, McGaw, & 
Smith, 1981). The numerator of the effect size formula 
may be adjusted for pretests or covariates by computa- 
tion of gain scores or use of ANCOVA, but the denomi- 
nator is always the unadjusted individual level standard 
deviation of the control group or (if necessary) a pooled 
standard deviation. 

Inclusion criteria. Studies were included in the 
present review if they evaluated one-to-one instruction 
delivered by adults (certified teachers, paraprofession- 
als, or volunteers) to students in the first grade who are 
learning to read for the first time. Studies had to com- 
pare tutoring to traditional instruction in elementary 
schools over periods of at least 4 weeks on measures of 
objectives pursued equally in experimental and control 
conditions. This duration requirement did not exclude 
any studies of first-grade tutoring. The first-grade 
requirement excluded only three studies (Bausell, 
Moody, & Walzl, 1972; Fresko & Eisenberg, 1983; and 
Shaver & Nuhn. 1971), which looked at remedial tutor- 
ing in the third grade and higher. Studies of cross-age 
and same-age peer tutoring (e.g., Cloward, 1967; 
Greenwood, Delquardi, & Hall, 1989; and von Harrison 
& Gottfredson, 1986), did not fit this criterion and were 
not included. All studies ever written in English were 
included. The only study done outside of the U.S.. by 
Clay (1985), examined only students who were success- 
ful in tutoring, not all who received it. This study is 
described in the section on Reading Recovery. 
Therefore, this best-evidence synthesis included ali 
methodologically adequate studies of one-to-one tutor- 



ing that focused on instruction delivered by adults to 
first graders. In a complete review of published as well 
as unpublished studies, a total of 16 studies met the 
inclusion criteria. 

Research on preventive tutoring 
programs 

All of the studies that met the inclusion criteria 
specified above evaluated a total of five tutoring pro- 
grams. These programs incorporated instructional materi- 
als as well as provision of one-to-one tutors. Some of the 
major characteristics of these programs are summarized 
in Table 1. Table 2 provides additional detail on models 
of reading used in each program. As is apparent from 
the Tables, the five programs vary widely in curriculum, 
integration with classroom instruction, use of certified 
versus paraprofessional tutors, and other factors not 
intrinsically related to the one-to-one setting. These pro- 
grams also differ in their model of reading and the mea- 
sures used to assess the effectiveness of these programs. 
As a result, we make no attempt to combine findings 
across studies in any way. However, we do discuss how 
different approaches to reading translate into the method 
used in the tutoring process. Finally, we discuss how 
ultimately the reading model is tied to the type of assess- 
ment each program uses to evaluate its effectiveness, 
suggesting that curriculum, instruction, and assessment 
are interrelated (Weade, 1987). 

Reading Recovery 

The preventive tutoring program that has received 
the most attention and use in recent years is Reading 
Recovery. This program was originally developed by 
Marie Clay (1985) in New Zealand, and is widely used in 
that country. In 1984-85, Marie Clay and a colleague, 
Barbara Watson, spent a year at the Ohio State 
University. They trained a group of teachers to use the 
program, and trained several Ohio State faculty members 
to train others. Since that time, research on Reading 
Recovery has been conducted at Ohio State, and the 
program has rapidly expanded in use. 

As applied in the longitudinal studies, Reading 
Recovery provides one-to-one tutoring to first graders 
who score in the lowest 20% of their classes on a pro- 
gram-developed diagnostic survey. The tutors are certi- 
fied teachers who receive training for 2.5 hours per 
week tor an entire academic year. Students are tutored 
for 30 minutes each day until one of two things happen. 
If students reach the level of performance of their class- 
mates in the middle reading group, they are "discontin- 
ued." If they receive 60 lessons without achieving this 
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Table 1 Characteristics of preventive tutoring programs 



Program 



Location 

of evaluations 



Tutor* 



Tutees 



Duration 



Tutoring methods and curriculum 



Reading Recovery Ohio; 



Chicago, Illinois teachers 



Certified reading Lowest first 



graders 



Success for All 



Inner-city Baltimore, Certified teachers Lowest first and 
Maryland second graders 



Prevention of New York; Ohio: Certified teachers Lowest first and 

Learning Disabilities California second graders 



Waliach Tutoring Inner-city Chicago, Paraprofessionais Lowest first 
Program Illinois; rural North graders 

Carolina 



Programmed 
Tutorial Reading 



materials: includes Lenoir 

City, North Carolina 



Inner-city Paraprofessionais All first graders 

Indianapolis, Indiana: 



30 minutes/day 
ranging from 12 to 
20 weeks 



20 minutes/day 
evaluted on 8-week 
cycle 

30 minutes, 

3 to 5 times/week 



30 minutes/day, 
1 year 



lSand30 



Learning to read by reading. Reading 
short stories and connecting writing 
activities to reading. Tutors guide children 
to learn metacognitive strategies. No con- 
nection to classroom instruction. 

Learning to read by reading. Closely inte- 
grated with structured classroom curricu- 
lum. Emphasis on metacognitrve strategies. 

Use of directed activities to teach specific 
perceptual and spatial skills involved in 
reading. Emphasis on skill acquisition. No 
emphasis on reading connected text No 
connection with a classroom curriculum 

Phonics-based tutoring program. 
Emphasis on systematic mastery of 
phonetic skills. Does not focus on reading 
connected text. Not integrated with class- 
room instruction. 

Highly detailed and prescribed lessons 
minutes/day with corresponding 

sight-reading program, comprehension, 
and word analysis. Emphasis on skills. 
Partially integrated with classroom 
instruction. 



Table 2 Components of reading emphasized in tutoring programs 



Program* 



Components 


Reading 
Recovery 


Success 
for All 


Prevention of 
teaming Disabilities 


Waliach 
Tutoring Program 


Programmed 
Tutorial Reading 


Perceptual analysis of print 


Yes 


Yes 


Yes 


Yes 


Yes 


Knowledge of print conventions 


Yes 


No 


Yes 


No 


No 


Decoding 


Yes 


Yes 


Yes 


Yes 


Yes 


Oral language proficiency 


Yes 


Yes 


Yes 


Yes 


No 


Prior knowledge 


Yes 


Yes 


No 


No 


No 


Lexical access 


No 


No 


No 


No 


No 


Syntactic analysis of sentences 


No 


No 


No 


No 


No 


Prose comprehension 












Prose structure 


No 


No 


No 


No 


No 


Story grammar 


No 


No 


No 


No 


No 


Inference making 


Yes 


Yes 


Yes 


No 


Nc 


Reading strategies 




Yes 


No 


No 


No 


M eta cognition and error detection 


Yes 


Yes 


No 


No 


No 


Error correction strategies 


Yes 


Yes 


No 


No 


No 
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level of performance, the stu dents are released from die 
program but considered "not discontinued." 

Model of reading 

In the Reading Recovery program, reading is 
viewed as a psycholinguistic process in which the reader 
constructs meaning from print (Clay, 1979; Pinnell, 
1985). According to Clay, reading is defined as a "mes- 
sage-gaining, problem solving activity, which increases 
in power and flexibility the more it is practiced/ Clay 
states that within the "directional constraints of the print- 
er's code, language and visual perception responses are 
purposefully directed in some integrated way to the 
problem of extracting meaning from text, in sequence, to 
yield a meaningful communication, conveying the 
author's message" (Clay, 1979, p. 6). 

Clay does not specifically address how language 
and visual perception are coordinated in order to extract 
meaning from text. Nevertheless, her discussion of read- 
ing and components of the Reading Recovery Program 
suggest that she includes the following components of 
reading in her model: perceptual analysis, knowledge of 
print conventions, decoding, crai language proficiency, 
prior knowledge, inference making, reading strategies, 
metacognition and error detection, and error correction 
strategies (see Table 2). 

Clay (1979) desenbes reading as the * process by 
which the child can, on the run, extract a sequence of 
cues from printed texts and relate these, one to another, 
so that he can understand the precise message of the 
text." In order to master this process, the child must 
have good control of oral language, developed percep- 
tual skills, the physiological maturity and experiences 
that allow the child to coordinate what s/he hears in lan- 
guage and sees in print, and enough hand-eye coordina- 
tion so s/ht iearn the controlled, directional patterns 
required for reading (Clay, 1979). Expert teachers are 
assumed to have sufficient implicit knowledge of the 
processes 'hat they can recognize the source of the 
child's difficulty. 

From this theory of reading, three major theoreticaJ 
principles serve as a foundation for the Reading 
Recovery Program. First, reading is considered i strategic 
process that takes place in the child's mind. Reading 
requires the coordination of manv strategies and visual 
information, the integrating of letter-sound relationships, 
features of pnnt, and the child's own background knowl- 
edge. Meaning is never derived just from the print alone, 
but from the interaction of the reader's unique back- 
ground and the print. Second, reading and wnting are 
interconnected. Having the child make the connection 
between reading and writing is essential to literacy 
development. Third, "children learn to read by reading" 



(Pinnell, 1989; Pinnell, DeFord, & Lyons, 1988). Children 
must engage in reading of connected text and should 
avoid working on isolated skills in order to become pro- 
ficient in reading. It is only by reading frequendy that 
the child can come to detect regularities and redundan- 
cies present in written language. 

These three principles set the foundation for the 
Reading Recovery program. Children in Reading 
Recovery spend most of their time engaged in reading 
and writing activities. There is no systematic presentation 
of phonics, yet during the reading and writing activities, 
letter-sound relationships are taught as one of the basic 
strategies in solving problems. Tutors use a variety of 
strategies to help students develop "independent, self- 
generating systems for promoting their own literacy" 
(Pinnell, 1985). 

Structure of tutoring 

For the first few tutoring sessions, the teacher and 
student "roarii around the known," reading and writing 
together in an' unstructured, supportive fashion, to build 
a positive relationship and to give the teacher a broader 
knowledge of the child. After this, teachers begin to use 
a structured sequence of activities, as follows (adapted 
from Pinnell et al., 1988, pp. 10-11). 

The child rereads familiar books. The child reads 
again several favorite books that s/he has previously 
read. The materials are storybooks with natural language 
nether than controlled vocabulary. Books within a lesson 
may range from quite easy to more challenging, but the 
child is generally reading above 90% accuracy. During 
this time, the child has a chance to gain experience in 
fluent reading and in using strategies "on the run" while 
focusing on the meaning of the text. The teacher inter- 
acts with the child during and after the reading, not "cor- 
recting," but talking with the child about the story and 
supporting the effective actions the child has taken. 

The teacher analyzes reading using the running 
record. Each day the teacher takes a running record of a 
book that was new for the child the previous day. The 
running record is a procedure similar to miscue analysis 
(Goodman, Watson, & Burke, 1987). Using a kind of 
shorthand of checks and other symbols, the teacher 
records the child's reading behavior during oral reading 
of the day's selected book. The teacher examines run- 
ning records closely, analyzing errors and paying partic- 
ular attention to behavior such as self-correction. In this 
way, s/he determines the strategies the child is using to 
gain meaning from text. This assessment provides an 
ongoing picture of the progress the child makes. While 
the child is reading, the teacher acts as a neutral observ- 
er; the child works independently. The accuracy check 
tells the teacher whether the text was well selected and 
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Table 3 First-year evaluations of Reading Recovery 



Effect sizes 



Measure 


PUot 
cohort 


Second 
conor* 


Letter identification 


+.36 


-.04 


Word test 


-.13 


+.40 


Concepts about print 


+.60 


+.65 


Writing vocabulary 


+.62 


+ 69 


Dictation 


+.57 


+1.03 


Text reading 


+.72 


+.91 


Not4: Pilot cohort data are from Huck St Pinneil, 1986; second cohort data are 
from Pinneil. Short, Lyons, & Young, 1996. There are apparent ceiling 
effects on the letter Identification and word tests. 



introduced the day before. 

The child writes messages and stories and then 
reads them. Every day the child is invited to compose a 
message and to write it with the support of the teacher. 
Writing is considered an integral part of gaining control 
over messages in print. The process gives the child a 
chance to closely examine the details of written lan- 
guage in a message that s/he has composed, supported 
by her/his own language and sense of meaning. 
Through writing, the child also develops strategies for 
hearing sounds in words and using visual information to 
monitor and check her/ his reading. 

After the construction of the message, the teacher 
writes it on a sentence snip and cuts it up for the child 
to reassemble and read. This activity provides a chance 
to search, check, and notice visual information. Using 
plastic letters on a magnetic board, the teacher may take 
the opportunity to work briefly with the letters to 
increase the child's familiarity with the names of letters 
and their use in known words such as the child's name. 
This work will vary according to the knowledge the 
child already has. 

The child reads new books. Every day the child is 
introduced to a new book that s/he will be expected to 
read without help the next day. Before reading, the 
teacher talks with the child about the book as they look 
at the pictures. The teacher helps the child build a frame 
of meaning prior to reading the text. The purpose of the 
introduction is not necessarily to introduce new words, 
but to create understanding in advance of reading so 
that it will be easier to focus on meaning. 

Every child's program differs. Children do a great 
deal of reading, but not from a graded sequence. No 



child reads the same series of books. The small books 
are carefully selected by the teacher for that child at that 
time. In writing, children work on their own messages, 
so they are writing and reading works that are important 
to them individually. The major difference within and 
across lessons lies in the teacher's ability to follow each 
child and to respond in ways that support acceleration 
and the development of strategies. Strategies may 
include directional movement, one-to-one matching, self- 
monitoring, cross-checking, using multiple cue sources, 
and self-correction. The Reading Recovery teacher uses 
instructional techniques designed to help the child 
develop and use such strategies. 

The tutoring model in Reading Recovery is sepa- 
rate from the instruction provided in the regular class- 
room. Most often, Reading Recovery teachers tutor stu- 
dents half time and either teach small groups of Chapter 
1 students or teach a regular class thn other half. The 
tutees may thus have the same teacher as their reading 
teacher and as their tutor, but in general this does not 
occur. 

Tutor training in Reading Recovery is extensive. 
During the first year, in addition to teaching a reading 
class and tutoring four students, the tutors attend weekly 
seminars during which they receive training in observa- 
tional, diagnostic, and assessment techniques and are 
schooled in the reading philosophy of Marie Clay. The 
tutors also participate in weekly "behind the glass" 
demonstration lessons where they observe actual tutor- 
ing sessions behind a one-way mirror and have the 
opportunity to critique and discuss the lesson. 
Considerable time c spent learning about the reading 
process and learning how to implement appropriate 
strategies to meet the needs of individual children. 
Follow-up inservice training continues after the first year. 
Additional training is required of Teacher Leaders who 
are certified to train Reading Recovery tutors in their 
areas. Teacher Leaders participate in a 1-year internship 
at the Ohio State University training center (other states 
such as New York are establishing regional centers), 
where they participate in reading and writing seminar 
and learn to train tutors using the "behind the glass" 
technique. 

Results 

Research evaluating Reading Recovery in New 
Zealand (Clay, 1985) focvi^d entirely on the discontin- 
ued students (those who were successful in the pro- 
gram), and therefore does not provide a full account of 
the effectiveness of the intervention. However, the U.S. 
research has included discontinued and not discontinued 
students — all of the students who either graduated from 
the program or received at least 60 lessons. 
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Preventing early reading failure 

The Ohio State group has conducted two longitudi- 
nal studies comparing Reading Recovery to traditional 
Chapter 1 pullout or in-class methods. The first (pilot) 
study (Huck & Pinneli, 1986; Pinnell, 1988) of Reading 
Recovery involved 21 teachers trained by Marie Clay 
who worked in six inner-city Columbus, Ohio, schools. 
Each school provided one Reading Recovery class and a 
matched comparison class. The lowest 20% of students 
in each class served as the experimental and control 
group, respectively. Students were pretested in 
September and December, 1984, but the tutoring did not 
begin until the spring semester, 1985. 

The second longitudinal study (DeFord, Pinnell, 
Lyons, 8l Young, 1988; Pinneli, Short, Lyoas, & Young, 
1986) involved 32 teachers in 12 schools in Columbus. 
Twelve of these teachers had been tutors in the pilot 
cohort. In this study, students in the lowest 20% of their 
classes were randomly assigned to Reading Recovery or 
control conditions. The research design originally made 
a distinction between students in the experimental and 
control groups who had Reading-Recovery-trained ver- 
sus non-Reaamg-Recovery-trained teachers in their regu- 
lar reading program. However, there were no differences 
on this factor, so the analyses focused on tutored versus 
untutored children, regardless of who their regular read- 
ing teacher was. 

The results at the end of the first implementation 
year for the two Ohio State studies are siimmarized in 
Table 3. Reading Recovery students substantially outper- 
formed control students on almost all measures. The 
exceptions were tests of letter identification and a word 
recognition scale, which had apparent ceiling effects in 
both conditions. 

Each spring for 2 years following the implementa- 
tion year, all children were assessed on Text Reading 
Level, an individually administered test in which students 
are asked to read from books with progressively more 
difficult content. This measure yields a reading level 
(e.g., second grade, first semester). 

The results on this measure, summarized in Table 
4,. show an interesting statistical pzradox. By the criterion 
of effect size, the effects of Reading Recover/ are clearly 
diniinishing each year. By the end of the third grade, the 
effect size for the pilot cohort has diminished from +.72 
to +.14, and in the second cohort the effect size dimin- 
ished from +.78 to +.25. On the other hand, the differ- 
ence in raw units between Reading Recovery and control 
students remained about the same across all 3 years, 
hovering around two points in the pilot cohort and three 
in the second cohort. Is the effect maintaining or not? 

The difference between these two measures is that 
the standard deviation of the Text Reading Level mea- 
sure increases each year, making the same raw differ- 



Tabic 4 Longitudinal evaluations of Reading 



Recovery 





Effect sizes (raw differences) 




Pilot 


Second 


Time of evaluation 


cohort 


cohort 


End of Implementation year 


+.72 (1.6) 


+.78 (2.8) 


1-year follow-up (Grade 2) 


+.29 (2.0) 


+.46 G.0) 


2-year follow-up (Grade 3) 


+.14 (1.8) 


+.25 (2.8) 



tioi*. All data are from Individually administered text reading level assessments 
developed by the program developers. Pilot cohort data arc from Pinnell, 
1968; second cohort data are from DeFord. Pinnell, Lyons. & Young, 1988. 



ence a smaller proportion of the standard deviation. In 
more substantive terms, the size of the difference may 
not be diminishing (assuming the measure is an equal- 
interval scale), but the importance of the difference is 
diminishing. For example, a difference of 3 months on a 
standardized reading test might be a big difference at the 
end of the first grade but is a small one at the end of 
sixth grade. 

Actually, there is a more complex story on the lon- 
gitudinal effects of Reading Recovery. The students who 
succeeded in Reading Recovery, those categorized as 
discontinued, were perfonning on average at a level like 
that of their classes as a whole, and substantially better 
than the comparison group of low achievers. On the 
other hand, all of the not-discontinued students (who 
had at least 60 tutoring sessions but failed to achieve at 
the level of the rest of their class) were still below the 
level of their classmates by third grade, and were sub- 
stantially lower than the control group. These not-dis- 
continued students represented 27% of the former 
Reading Recovery students tested in the third grade in 
the second cohort study (DeFord et al., 1988). 

Effects of Reading Recovery on promotions from 
grade to grade. Participation in Reading Recovery 
increased students' chances bei: # promoted lo the 
second grade in comparison to the control low achiev- 
ers. Although 31% of comparison students were retained 
in first grade or assigned to special education, this hap- 
pened to only 22°/6 of Reading Recovery students 
(DeFord et al., 1988). However, by the third grade this 
difference had mostly disappeared. Two years after the 
children were in the first grade, a total of 59.6% of 
Reading Recovery children and 57.8% of control children 
were in the third grade 2 years after first grade. A school 
district evaluation in Wakeman, Ohio, found that first- 
grade retentions dropped from 24 to 1 in the 3 years 
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Table 5 Ohio statewide study of Reading Recovery 
(adjusted effect sizes in comparison to 
control groups) 

Reading Reading DISP Reading- Writing 
Measure Recovery Success Tutoring Group 



February 



Dictation 


+ 65 


+.45 


-.05 


+.14 


Text Reading Level 


+ 1.50 


+.45 


-.01 


+.41 


Woodcock 


+.49 


+.04 


+.25 


+ 23 


Gates 


+ 51 


+.27 


+.14 


+.23 


May 










Gates-MacGinitie 


+.19 


-.14 


-.05 


+.34 


October 










Dictation 


+ 35 


+ 00 


-25 


+.29 


Text Reading Level 


+ 75 


+ 07 


+.06 


+.32 



Adapted from P>nneU. Lyons. DeFord. Bryk. & Seltzer. 1991. 



after implementation of Reading Recovery (Lyons, 
Pinnell, Deford, McCarrier, & Schnug, 1989). 

One additional study compared Reading Recovery 
to control treatments in first grade. This was a study con- 
ducted in four Chicago elementary schools. As in the 
earlier studies, students were randomly assigned to 
Reading Recovery or control conditions. Because neither 
standard deviations nor statistical tests are piesented, 
effect sizes cannot be computed, but program effects in 
comparison to control students were clearly substantial. 
Applying standard deviations from the Ohio studies to 
the same measures used in Chicago yields end-of-first- 
grade effect sizes of appioximately +.90 on dictation and 
text reading level. 

The most recent major study of Reading Recovery 
conducted by the Ohio State group (Pinnell, Lyons, 
DeFord, Bryk, & Seltzer. 1991) evaluated the full pro- 
gram in comparison to three alternative programs and a 
control group in 10 Ohio school districts. The treatments 
were as follows: 

1. Reading Recovery (RR) was implemented as in 
earlier assessments. 

2. Reading Success (RS) was the same as Reading 
Recovery except \bzz teachers received a 2-week 
training session in the summer instead of the 
yearlong, 2 to 3 hours per week training with 
"behind the glass" demonstration teaching used 
in Reading Recovery. In comparison to Reading 
Recovery, this treaunent tested the possibility 
that effects like those for the program as usually 
implemented could be obtained with far less 
extensive training, a major stumbling block to 
widespread diffusion. 



3. Direct Instruction Skills Plan (DISP) was an indi- 
vidual tutorial program that tested the possibility 
that the one-to-one tutoring, not the parriculars 
of the Reading Recovery model, explains the 
effects of the program. DISP used direct instruc- 
tion in specific skills such as letter, sound, and 
word recognition, sequencing, filling in blanks, 
answering questions, and reading extended text. 
Teachers were encouraged to design lessons 
themselves to teach these and other skills. 

4. Reading and Writing C7rowp(RWG) was a small 
group tutorial model taught by teachers who 
had been trained as Reading Recovery teachers. 
They used Reading Recovery materials and 
strategies but were asked to adapt them to the 
small group setting in their own ways. This treat- 
ment essentially tested the effects of the one-to- 
one tutoring aspect of Reading Recovery, hold- 
ing curriculum constant. 

5. Control group for each treatment was the 
Chapter 1 pullout program already in existence 
in each school. 

Four schools (one per treatment) were involved in 
each district. In each school that already had a Reading 
Recovery teacher, students were randomly assigned to 
RR or control (Chapter 1) treatments. In other schools, 
additional teachers were hired from the district's substi- 
tute lists to implement the RS or DISP tutoring models. 
Trained Reading Recovery teachers were added to 
schools to implement the Reading and Writing Group 
(RWG) treatment. Students were randomly assigned to 
treatment or control classes. 

The treatments were implemented starting early in 
first grade. Students were then assessed in February, in 
May, and again in the following October. The results are 
sumr tarized in Table 5. 

As is clear from Table 5, the effects varied consid- 
erably according to measure and time of test administra- 
tion. 'The February measures clearly favored the Reading 
Recovery on all measures and the Reading Success 
model on the two measures developed as part of the 
Reading Recovery program. Dictation and Text Reading 
Level. However, the February measures are biased in 
favor of the three tutoring models. By February, the 
tutoring was concluded, and students moved into the 
Chapter 1 group program. In contrast, the RWG and 
Chapter 1 control group programs were yearlong inter- 
ventions, so measuring effects in February discriminates 
against them. 

Unfortunately, the only test given in May was die 
standardized Gates-MacGinitie, which tound few effects 
for any treatment. 
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The October follow up provides the best indication 
of the effects of the four programs. The most positive 
effects were found for Reading Recovery on Dictauon 
(ES - +.35) and Text Reading Level (ES - +75). Neither 
of the other two tutoring methods (RS and DISP) found 
any positive effects. It is interesting to note that after the 
full program, it was the Reading and Writing Group 
(RWG) treatment that had the most positive effects (ES - 
+.29 for Dictauon, +.32 for Text Reading Level). This 
treatment also had the largest positive effects on the May 
Gates-MacGinitie of all treatments (ES - +.34). 

One important factor may be confounded with the 
effects of the four programs. The teachers in the two 
most successful treatments, Reading Recovery and 
Reading and Writing Group, were experienced Reading 
Recovery teachers who had a year of Reading Recovery 
training and at least a year of experience in implement- 
ing the program. In contrast, the Reading Success and 
DISP teachers were hired from the substitute list and 
may have been considerably less skilled and less 
experienced. 

At a minimum, the Ohio statewide study provides 
one more convincing evaluation of Reading Recovery, 
showing large effects, especially on Text Reading Level, 
which maintain into the school year following the inter- 
vention. The findings suggest that the yearlong training, 
the particular cuniculum and instructional model used, 
and the one-to-one aspect of the tutoring are all critical 
to the success of the model, but these conclusions may 
be tempered by possible differences in teacher quality in 
the groups that received shorter training (RS) and the 
alternative tutoring model (DISP). 

A few methodological issues about the Reading 
Recovery research are worth raising. First, there is an 
articulation between the Reading Recovery program and 
the measures used to evaluate the program, suggesting 
that what is taught is what is measured. The measures 
used were all individually administered scales designed 
either by Marie Clay and her associates or by the Ohio 
State researchers. Five of the measures, Letter 
Identification, Word Test, Concepts about Print, Writing 
Vocabulary, and Dictation, make up the Diagnostic 
Survey, which was developed by Clay. The Letter 
Identification test asks students to identify 54 letters in 
upper and lower case. The Word Test is ? list of high- 
frequency words from the basal reader used in the 
school district. Concepts about Print asks the students to 
identify conventions of print and reading. The Writing 
Vocabulary has the children write down as many words 
as they can, starting with their own name, in 10 minutes. 
The Dictation test assesses children's ability to write 
down every word in a sentence that is read to them. In 
scoring this test, children are given credit for every 



sound correctly represented. The Text Reading Level is 
the sixth test administered in the Reading Recovery eval- 
uation. This test consists of a series of graded stories that 
the child reads. A running record of the child's oral read- 
ing is taken and then an accuracy level is calculated. 
These measures correspond to the model of reading in 
Re-ding Recovery. As discussed earlier, the reading 
model emphasizes oral language, perceptual analysis, 
concepts of print, reading strategies, and metacognition. 
All of these aspects are emphasized in the outcome mea- 
sures. Therefore, children who were tutored in Reading 
Recovery were also more familiar with the assessment 
than were the children in the control groups. 

It also appears that bias in favor of the kinds of 
skills taught in the program is most likely at the low lev- 
els of the Text Reading Level measure, where assess- 
ments focus on concepts of print, using pictures and pat- 
terns to guess story content, and other skills specifically 
taught in Reading Recovery. The finding of particularly 
large effects on Text Reading Level (in contrast to other 
measuresywas especially pronounced in the Ohio 
statewide study (Pinnell et al., 1991). 

Secondly, Reading Recovery has a policy of not 
serving students who have already been retained in first 
grade and students identified for special education. One 
of the reports (Pinnell et al., 1986) implies that some stu- 
dents originally selected for tutoring failed to make ade- 
quate progress in early tutoring sessions and were 
excused from tutoring (and therefore excluded from the 
evaluation). Any of these practices might have influ- 
enced the Reading Recovery sample by excluding the 
very lowest achievers. 

These criticisms aside, the effects of Reading 
Recovery are impressive at the end of the implementa- 
tion year, and the effects are maintained for at least 2 
years. In addition, the Ohio State researchers have stud- 
ied implementation issues that affect the quality of the 
program. For example, Lyons (199D studied the effects 
of duration of training on Reading Recovery teachers. 
Teachers who had a 2-week inservice program were 
compared to teachers who attended a yearlong training 
program. The results showed that students who had 
teachers who received more extensive training outper- 
formed students who had teachers in the 2-week pro- 
gram on Text Reading Level. 

In another study, Handerhan (1990) conducted a 
sociolinguistic analysis of teachers and children in 
Reading Recovery. Reading Recovery tutoring sessions 
were videotaped and sessions of four of the most and 
least successful teachers (based on what was accom- 
plished with the student) were analyzed. Handerhan 
(1990) found that across tutors there was consistency in 
how they structured the lessons regarding similarities in 
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language, materials, and procedural techniques. 
However, more successful tutors showed greater vari- 
ability in the strategies they used and the less successful 
tutors engaged more in presenting letters and words as 
discrete skills without reading for meaning. This study is 
important because it documents the variability in instruc- 
tion during tutoring as well as identifying what behaviors 
are necessary to be a successful tutor helping children 
learn to read. The rapidly expanding use of Reading 
Recovery throughout the U.S. (see Lyons, Pinnell, 
DeFord, McCarrier, & Schnug, 1989) shows that the pro- 
gram is practical to use. 

Success for All 

Success for All (Madden et al., 1991; Slavin, 
Madden, Karweit, Dolan, & Wasik, 1992; Slavin, Madden. 
Karweit, Livermon, & Dolan, 1990) is a comprehensive 
schoolwide restructuring program that is designed pri- 
marily for schools serving large numbers of disadvan- 
taged students. Its main intention is to see that all chil- 
dren are successful in basic skills, particularly reading, 
the first time they are taught. One major element of 
Success for All is one-to-one tutoring by certified teach- 
ers for students in Grades 1-3 who are hiving difficulties 
learning to read. The program includes uany other ele- 
ments, such as a beginning reading program, preschool 
and kindergarten programs, and family support services. 
However, for low-achieving first graders, who receive 
most of the tutoring services, the Success for All program 
can be seen primarily as a preventive tutoring program. 

Model of reading 

The Success for All tutoring program is based on 
research that "points to the need to have students learn 
to read in meaningful contexts ana at the same time 
have a systematic presentation of word attack skills" 
(Slavin et al., 1992). Its underlying philosophy is that 
there is certain regularity to language, and that direct 
presentation of phonics is viewed as a helpful strategy 
which children can use to figure out words. Children 
also need to build a strong sight vocabulary that will 
help in identifying words that are not decodable. Along 
with the systematic presentation of phonics, children 
engage in reading meaningful connected text. The 
Success for All program emphasizes that reading is a 
strategic process that takes place in the student's mind 
and that these strategies should be taught directly. 

Unlike Reading Recovery, Success for All does not 
articulate a complete theory of beginning reading. 
However, an underlying model of reading can be seen 
in the structure and content of the program. There are 
four components that drive the Success for All tutonng 
program. First, children learn to read by reading mean- 



ingful text. Reading skills are not acquired by children 
learning isolated unconnected information about print. 
Second, phonics needs to be taught systematically as a 
strategy for cracking the reading code. Children engage 
in reading stones that are meaningful and interesting, yet 
have a phonetically controlled vocabulary. Third, chil- 
dren need to be taught the relationship between reading 
words and comprehending what they read. Mere word 
recognition is not reading. The emphasis on comprehen- 
sion is directly related to the fourth component, the 
emphasis on children's need to be taught strategies to 
help them become successful readers. Children who 
have problems learning to read often do not know how 
to effectively use metacognitive strategies to help them 
read. Through direct instruction, children are taught 
when, how, and why they should use strategies. 

In summary, Success for All emphasizes the follow- 
ing components in its model of reading: perceptual 
analysis, decoding, prior knowledge, oral language profi- 
ciency, inference making, reading strategies, metacogni- 
tion and error detection, and error correction strategies. 

Structure of tutoring 

The tutoring model used in Success for All is differ- 
ent in many ways from that used in Reading Recovery. 
One difference is that in Success for All, the tutoring 
model is completely integrated with the reading pro- 
gram. The tutor s most important responsibility is to 
make sure that the student is making adequate progress 
on the specific skills and concepts being taught in the 
reading class. 

Another difference is that in Success for All. fust 
graders receive tutoring as long as they need it. 
Although most students receive tutoring for part of a 
year, seme receive it all year and then continue to be 
tutored into the second grade. The commitment in 
Success for All is to see that every child succeeds, that 
no child is retained or assigned to special education 
except under extreme circumstances. 

First graders are initially selected into tutoring in 
Success for All on the basis of individually administered 
informal reading inventories given in September. After 
that, however, students are assessed every 8 weeks in 
terms of their progress through the reading curriculum. 
On the basis of these 8-week assessments, students who 
are doing well may be rotated out of tutoring as other 
students are rotated into tutoring. The amount of tutor- 
ing received by a given student may vary from 8 weeks 
to the entire year or more. 

Students receive tutoring every day for 20 minutes. 
This time is usually scheduled during an hourlong social 
studies/science block, so that tutoring represent? addi- 
tional time in reading. 
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The tutors are certified teachers recruited in the 
same way as other teachers. Each tutor teaches a 90- 
minute reading class each day (to reduce class size for 
reading; and then spends the rest of the day tutoring 
three children per hour. Because the tutors teach a read- 
ing class, they are fully aware of what the reading pro- 
gram is; if a child is struggling with Lesson 37, the tutor 
knows exactly what is required for success in Lesson 37 
because he or she has taught it. 

In many cases, tutors work with students who are 
also in their morning reading class. When scheduling 
does not allow this, the student's reading teacher fills out 
a tutor/teacher communication form that indicates what 
lesson the student is working on in class and the 
teacher s assessment of the specific problems the student 
is having with that lesson. The tutor uses this informa- 
tion to plan the tutoring session. This communication 
ensures coordination between the classroom instruction 
and tutoring. 

The tutors receive 2 days of training (along with all 
other l^eginrung reading teachers) to learn to teach the 
Success for All beginning reading program (described 
below), and then they receive 4 additional days of train- 
ing on assessment and on tutoring itself. Tutors are 
observed weekly by the program facilitator and given 
direct feedback on the sessions. 

A strong emphasis is placed on teaching compre- 
hension strategies. The tutor's goal is to get the students 
to read fluently, and to understand what they read. 
Tutors are trained to explicitly teach metacognitive 
strategies to help students monitor their comprehension. 
For example, a tutor will teach a student to stop at the 
end of each page and ask, "Did I understand what I just 
read?" The students learn to check their own compre- 
hension and to go back and reread what they did not 
understand. 

Each tutoring session is structured, but the tutor is 
constantly diagnosing and assessing the individual needs 
of each student and tailoring the sessions to fit the stu- 
dent's specific problem. If a student is having difficulty 
with fluency, the tutor will have the student do repeated 
reading aloud of a story. With similar materials, a tutor 
may work with another child on comprehension 
monitoring. 

A typical tutoring session begins with the student 
reading out loud a familiar story that he or she has read 
before in tutonng and in the reading clrr-s This is fol- 
lowed by a 1 -minute drill of letter sounas lo give the stu- 
dent the opportunity to practice the letter sounds taught 
in class. The major portion of the tutorinp session is 
spent on reading aloud 'shared stories" that correspond 
to the beginning reading lessons. The shared stories are 
interesting, predictable stones that have phonemically 



controlled vocabulary in large type and other elements 
of the story in small type. The teacher reads aloud the 
small-type sections to provide a context for the large- 
type portions read by the students. The tutor works with 
the student to sound out the phonemically regular 
words, asks comprehension questions about the whole 
story, and has the student reread passages out loud to 
gain fluency. Writing activities are also incorporated into 
the reading activities. 

As noted, the tutoring model is closely integrated 
with the reading program (Slavin, Madden, Karweit, 
Livermon, & Dolan, 1990), in which students are 
regrouped according to their reading levels. Use of tutors 
as reading teachers allows schools to reduce class size to 
about 15 students who are all at one level, so there are 
never multiple reading groups in any reading class. Tliis 
allows teachers to spend the entire class period actively 
teaching reading, as it removes the need for the follow- 
up or seatwork activities typical of classes with multiple 
reading groups. The beginning reading program empha- 
sizes reading to students, engaging students in discus- 
sions of stdry structure, and developing oral language 
skills. Students begin using shared stories, as described 
earlier. As letter sounds and sound blending strategies 
are taught, students can apply them in their books. 
Students do a great deal of partner reading and pair 
practice activities, and writing is taught along with 
reading. 

There is a high degree of structure in the beginning 
reading program, which is helpful in integrating the 
classroom instruction with the tutoring session. 
Expectations for each lesson are clear, so the teacher 
and tutor can know that they are working on the same 
objectives. Integration is also facilitated by the use of 
brief tutor/teacher communication forms, on which each 
can tell the other about particular successes or problems 
a child is experiencing. 

Success for All is currently being evaluated in sev- 
eral schools in several school districts in six states. 
Evaluations most lelevant to the tutoring aspect of the 
program relate to low achievers in two Baltimore schools 
that have had adequate funding to provide a high level 
of tutoring services for several years. Abbottston 
Elementary, the onginal pilot school, has implemented 
Success for All for 4 years. City Springs Elementary is a 
fully funded site whose implementation began a year 
after Abbottston. Each school was matched with a similar 
comparison school, and then students were individually 
matched on standardized reading measures. The student 
bodies at both Baltimore schools are almost entirely 
African American. Seventy-six percent of Abbottston's 
students qualify for free lunch. City Springs serves the 
most disadvantaged student body in the district; all its 
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Table 6 Effects of Success for All on low achieving students 



Measure 



Grade 1 
Woodcock Lerter-Word 
Woodcock Word Attack 
Durrell Oral Reading 
Durrell Silent Reading 
Mean 

Grade 2 
Woodcock Letter- Word 
Woodcock Word Attack 
Durrell Oral Reading 
Durrell Silent Reading 
Mean 

Grade 3 
Woodcock Letter-Word 
Woodcock Word Attack 
Durrell Oral Reading 
Durrell Silent Reading 
Mean 



Abbottston site 



Year 1 



+0.42 
+ 1.34 
+0.99 

+ 1.01 



City Springs site 



Year 2 


Year 3 


Year 4 


i car i 


Year 2 


Year 3 


+ 1.57 


+1.09 


+2.40 


+0.08 


+ 1.03 


+0.57 


+4.22 


+1.00 


+1.30 


+0.51 


+ 1 T7 


+U. /I 


+1.97 


+0.21 


+1.79 


+ 1.14 


+0.23 


+0.37 


+1.73 


+1.06 




+0.47 


+0.4S 


+2.37 


+0.84 


+1.83 


+0.55 


+0.87 


+0.55 


+0.39 


+0.37 


+ 1.07 




+0.09 


+0.96 


+0.66 


+ 1.78 


+ 1.28 




+0.75 


+1.36 


+0 52 


+0.71 


+0.87 




+0.28 


+0.9fl 


+ 1.26 


+0.64 






+0.16 


+0.71 


+0.88 


+ 1.07 




+0.32 


+1.11 




+0.57 


+ 1.22 






+0.20 




+ 1.22 


+2.70 






+0.50 




+1.11 


+ 1.82 






+0.78 




+L36 










+ 1.07 


+ 1.91 






+0.49 



Note Data are effect sizes from Slavin. Madden. Karwc .crmon. & Dolan. 1990; Slavm. Madden. Kafweu. Do lan. & Wasik, 1990 ApnJ Madden Siavin 
wasik. Shaw, Leighton. & Maimer, 1991. and Madden, Slavin, Karweit. Dolan. & Wasik. 1992. 



Kanveit. Dolan, 



children come from housing projects, and 96% receive 
free lunch. Both are Chapter 1 schoolwide projects. Each 
May, students are individually assessed on scales from 
the Woodcock Language Proficiency Battery (1984) and 
the Durrell Analysis of Reading Difficulty (1980). 

The results for the students in Grades 1-3 who 
scored in the lowest 25% on the pretests are summarized 
in Table 6. The amount of tutoring received by these stu- 
dents varied depending on their needs; almost all 
received some tutoring, but in some cases they received 
8 weeks, while some second or third graders may have 
received more than a year of daily tutoring. 

The results shown in Table 6 indicate powerful 
effects of the combination of tutoring, curricular 
changes, and family support services used in Success for 
All. At both schools in all years, first-grade low achievers 
have scored better than their matched counterparts in 
control schools (mean effect size - +1.15). Second 
graders who started in Success for All in the first grade 
or earlier also scored substantially better than control 
students (mean effect size - +.82), as did third graders in 
the program for 3 years (ES - +1.16). These second- and 
third-year effects should not be compared with the sec- 
ond* and third-year effects of Reading Recovery; the 
Reading Recovery data relate to the lasting effea of a 
first-grade intervention, while those for Success for All 
relate to the effects of continuing intervention. Although 
effea sizes stayed at approximately the same level in 



second and third grades as in first, this is an indication of 
a growing effea. Because standard deviations increase 
each year, a constant effea size means a growing differ- 
ence between experimental and control groups in grade 
equivalents or raw scores. 

In addition to effects on reading achievement, all 
three schools substantially reduced assignments of stu- 
dents to special education for learning problems and 
essentially elirriinated retentions (Slavin, Madden. 
Karweit, Livermon, & Dolan, 1992). 

As with Reading Recovery, there are methodologi- 
cal limitations to research on Success for All that may 
affea the results. First, because only one school was 
involved in each comparison, school effects could 
account for part of the observed differences. Lack of ran- 
dom assignment of schools or students also could have 
affeaed the results. 

The effects of Success for All were positive for the 
lowest achieving quarter of students involved as well as 
for the other students in the school (Slavin, Madden, 
Karweit, Dolan, & Wasik, 1990; Slavin, Madden, Karweit, 
Livermon, & Dolan. 1990; Madden et aL, 1991, 1992). 
However, the effects for the higher achieving students 
must be ascribed to the curriculum and other program 
elements, as few of them received any tutoring. Also it is 
important to note that schools using Success for All with- 
out extra resources for tutoring also obtained very posi- 
tive results, although not as positive as those for the fully 
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funded schools (see Slavin et al., 1992). These schools 
used their existing Chapter 1 funds to provide some 
tutoring (almost all to first graders), but could not sustain 
the amount of tutoring provided to Abbottston and City 
Springs low achievers. A school in Philadelphia used a 
modified version of Success for All to work with limited- 
English-proficient (LEP) Cambodian students, and also 
found very positive outcomes for these students and for 
non-LEP students (Slavin & Yampolsky, 1991). The eval- 
uation of Success for All shows the potential power of a 
tutoring program that is integrated with a structured 
reading program. Evaluations of additional years will be 
needed to determine whether the program's goal of suc- 
cess for every child is realistic. Follow-up studies are 
needed to determine the validity of the program's 
assumption that success through the elementary grades 
will have long-term consequences, but the data collected 
to date clearly demonstrate the program's effectiveness 
when used at the beginning of students' school careers. 

Like Reading Recovery, Success for All articulates 
instruction with assessment. The Word Identification, 
Letter Identification, and Word Attack subtests of the 
Woodcock assess letter and word knowledge and phon- 
ics skills. The Durrell Oral Reading test asks children to 
read passages in a limited amount of time and answer 
comprehension questions. Given the emphasis on 
decoding, oral language, perceptual analysis, reading 
strategies, and prose comprehension, these measures 
correspond with the model of reading in the Success for 
All program. 

Like children in Reading Recove.y, children in 
Success for All may be more familiar with the items that 
they are being evaluated on than are the controls 
because of the relationship between instruction and 
assessment. 

Finally, unlike the verification of Reading Recovery, 
the fidelity of program implementation in Success for All 
has not been documented. It would be important to 
know whether the model of reading is being appropri- 
ately implemented and if tutoring sessions look similar 
across sites. Also, if there is no consistency of delivery of 
instruction across tutoring sessions, what do the effect 
sizes mean* Qualitative implementation data need to be 
collected in order to validate the consistency in delivery 
of instruction and to determine how this translates into 
increased reading perfommnce. 

Prevention of Learning Disabilities 

Prevention of Learning Disabilities is a program 
developed by the Learning Disorders Unit of the New 
York University Medical Center that identifies first and 
second graders who are at risk for school failure and 



provides intensive instruction before they begin to fall 
behind in basic skills. Students involved in the program 
are screened in first grade using an instrument (SEARCH) 
that focuses primarily on neurological indicators of learn- 
ing disabilities and on perceptual and general immaturi- 
ty. Using diagnostic information from SEARCH, first 
graders are given lessons either individually or in small 
groups that attempt to strengthen their areas of percep- 
tual weakness. The instructional interventions, called 
TEACH, are designed primarily to build perceptual skills, 
such as recognition, discrirnination, copying, and recall, 
and are administered by certified teachers in 30-minute 
sessions three to five times per week. 

Model of reading 

Unlike Reading Recovery and Success for All, the 
Prevention of Learning Disabilities tutoring program is 
based on a physiological view of learning and learning 
disorders. As to reading itself, Silver and Hagin's (1990) 
model is based on the assumption that reading is a 
"complex process that must be analyzed according to 
component skills in order to understand the learning dif- 
ficulty." However, these authors take a very atomistic 
view of reading and teaching reading. In compartmental- 
izing these reading skills, the goals are to identify those 
components with which the child is having difficulty and 
to teach to those specific skills. 

Silver and Hagin propose that children need to 
have four skills in order to read: prereading skills, word 
attack skills, comprehension, and study skills. Prereading 
skills include the visual discrimination of letters, recogni- 
tion of symbols in their correct orientation, the ability to 
organize symbols into groups, and several auditory 
skills. Word attack skills involve not only the use of 
phonics to figure out words, but also the identification of 
whole words using visual cues such as letter combina- 
tions. Comprehension involves having a rich vocabulary, 
being able to select the nght meaning of a word, and 
making inferences. Study skills are described as the tools 
for acquiring information. These skills enable children to 
locate and select relevant elements within a sequence 
and organize the content of the text for later recall based 
on the goal of reading. 

Although Silver and Hagin proposed these compo- 
nents of reading, not all of these aspects are directly 
taught in the program. There is considerable emphasis 
on matching, copying, and recalling individual letters 
and words, and little emphasis on reading for meaning. 
Phonics are not systematically presented, but instead let- 
ter-sounds are reinforced. In total, the Prevention of 
Learning Disabilities program includes in its model the 
following components of reading: perceptual analysis of 
print, decoding, and oral language proficiency. 
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Table 7 Effects of Prevention of Learning Disabilities 
on at-risk students 



MC3-5UrC3 


Effect si2es 








End of 


stiver Be nagin, iy/y\ zsio, or 


End of 


Grade 3 


juvct ci ai., lyQi oraoc i 


Grade 2 


Follow-up 


SEARCH (Perception) +.99 






WRAT (Oral Reading) +.85 


+1.06 


+.95 


Woodcock Work Identification +.94 


+ 91 


+ 1.38 


Woodcock Word Attack +1.39* 


+1.67 


+1.26 


SRA Comprehension — 


+.95 




Gates-MacGmitle Comprehension — 




+.30 


Gates-MacGinitie Vocabulary — 




+.15 


Arnold et ai., 1977 TEACH tb. control 


Reg. tutoring m. control 


End of End of 


End of 


End of 


WRAT Qrade 1 Qn&LZ 


Grade 1 


Grade 2 


+.33 +1.09 


+ 16 


+.11 


Mantzjcopculos 






etal, 1990 TEACH control 


Phonetic vs. control 


Total reading End of End of 


End of 


End of 


achievement Qade 1 Grade 2 


Grade I 


Grade 2 


(combined SAT. 






CTBS. CAT) +16 +.21 


+.28 


+.13 



Structure of tutoring 

No coordination with the regular reading program 
is apparent in program descriptions. Children come to 
tutoring for 30 minutes, 3 to 5 days a week. Tutors, who 
are certified teachers, work on perceptual skills such as 
recognition, discrimination, copying, and recalling of 
information. There is no emphasis on reading connected 
text and no systematic presentation of phonics. 

Results 

An evaluation of Prevention of laming Disabilities 
was conducted in inner-city New York City classrooms 
(Kagin, Silver, Beecher, 1978; Silver & Hagin, 1979). 
Students were randomly assigned to experimental or 
control classes, and those in the experimental group 
received TEACH instruction for 2 years. Table 7 summa- 
rizes the findings. On reading measures as well as on 
perception measures, the experimental students per- 
formed substantially better than controls. In the same 
study, Silver and Hagin (1979) found that students who 
had a full year of TEACH performed better than those 
who had only a half year. 

In a similar study, Silver, Hagin, and Beecher 
(1981) found that third graders who received the TEACH 
intervention in first and second grade showed signifi- 
cantly greater performance in oral reading, word identifi- 
cation, and word attack skills (a measure that assesses 



the ability to sound out words) when compared to a no- 
treatment control group. 

Arnold et al. (1977) replicated the Prevention of 
teaming Disabilities, program in inner-city and middle- 
class schools in Columbus, Ohio. Using SEARCH, 86 first 
graders were identified as being at-risk for reading prob- 
lems and were assigned to one of three groups: the 
TEACH intervention group, a group who received acade- 
mic tutoring from a teacher, and a no-treatment control 
group. Students in the TEACH and regular tutoring 
group received tutoring for 30-minute sessions twice a 
week. Table 7 summarizes the findings. At the end of 
one year, the effects for both the TEACH intervention 
and the regular tutoring were minimal on the WRAT 
achievement test. However, at the end of the second 
year of the intervention, students in the TEACH group 
performed significantiy better than the students in the 
regular tutoring and the no-treatment control group on 
the WRAT. 

A more recent study by Mantzicopoulos, Morrison, 
Stone, and Setrakian (1990) found few effects for the 
TEACH intervention. In this study, first graders who 
were identified as at risk for reading failure by the 
SEARCH screen were assigned to three groups: a TEACH 
group, a phonics tutoring group, and a no-contact con- 
trol group. In the phonics tutoring group, students were 
given phonics instruction, were drilled in phonics, and 
read phonemically regular books. This is in contrast to 
the TEACH group, which worked on visual-auditory dis- 
crimination activities. In both the TEACH and phonics 
tutoring groups, students received one-to-one tutoring 
for 30-minute sessions twice a week. The findings are 
summarized in Table 7. 

On reading measures and perceptual measures, 
students in the TEACH group did not perform any differ- 
ently than the phonics tutoring group or the no-contact 
controls. Not surprisingly, the phonics tutoring group did 
show some significant improvement in word attack 
skills, compared to the no-contact control. 

Mantzicopoulos et al. (1990) suggest that one rea- 
son for the disappointing effects of TEACH was that the 
high attrition rate of their students produced a skewed 
sample distribution. Attrition, of course, is a factor in 
working with at-risk populations. 

As in Reading Recovery and Success for All, the 
measures used to assess this program are consistent with 
the model of reading and the instruction delivered in 
tutoring. The Word Identification and Word Attack sub- 
tests of the Woodcock and the Gates-MacGinitie 
Vocabulary test assess letter, letter-sound knowledge, 
and word knowledge. The WRAT (oral reading), SRA 
comprehension, and the Gates-MacGinitie Compre- 
hension scales assess reading connected text and com- 
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prehension, although these are multiple-choice tests and 
do not assess on line reading as does the Text Reading 
Level of Reading Recovery and the Durrell Informal 
Reading Inventory. Students in Prevention of Learning 
Disabilities are provided with minimal instruction in 
reading and answering comprehension. 

A final concern about the findings from Prevention 
of Learning Disabilities is that, as with Success for All, 
there is no information about program implementation. 
There is no way of determining if the program imple- 
mented by Mantzicopoulos et al. (1990) was the same as 
Silver and Hagin's program or if variability in program 
implementation produced these different results. 
Qualitative implementation data need to be collected in 
order to determine if there is consistency across tutoring 
sessions and if what is proposed in the model of reading 
is being carried out in instruction. 

Wallach Tutoring Program 

The Wallach Tutorial Program (Wallach & Wallach, 
1976) is, like Reading Recovery and Success for Ail, 
based on the idea that students who fail to learn to read 
in first grade are seriously at risk, and that carefully 
structured tutoring intervention can prevent reading fail- 
ure. In this model, students receive 30 minutes of tutor- 
ing per day for a year. Unlike Reading Recovery and 
Success for All, the Wallach model uses paraprofession- 
als as tutors. The tutoring is directed to students who 
score below the 40th percentile on a standardized read- 
ing test. 

Model of reading 

According to Wallach and Wallach (1976), reading 
is a skill that "car be broken down into component 
parts; that these parts can be arranged in a cumulative, 
hierarchical manner such that learning of the latter parts 
builds systematically upon what has been learned 
already" (p. 56) and can be best learned by "systemati- 
cally cumulating the mastery of component subskills* (p. 
77). Wallach and Wallach propose that in acquiring 
these subskills, children must first establish competence 
"in the recognition and manipulation of sounds," and 
then ccquire skill "in the use of the alphabetic code and 
in blending." Finally, they need to effectively apply 
"these competencies in reading printed material." 

Because of this skills-mastery approach to reading, 
phonics are systematically presented in this program. 
Unlike Success for Ail, the phonics lessons are not coor- 
dinated with emphasis on reading connected text. 
Instead, reading comes after the letter sounds have been 
learned. Also, little consideration is given to metacogni- 
tive strategies. From Wallach and Wallach's point of 
view, the goal is to teach the students skills that they 



Table 8 Grade equivalent dfferences and effect sizes 
for Wallach Tutoring Program 



Grade 

Measures equivalent differences Effect sizes 

Tutored matched control 

Wallach & Wallach (1976) 

Spache Word Recognition *+ 5 +.64 

Spache Consonant Sounds Test — +.66 

Tutored control group 

Dorvai, Wallach, & Wallach (1978) 

Spache Word Recognition •♦1.6 to 1.8 — 

Spache Reading Passages — — 

CTBS — +-75 



•Computation based on median scores 



need to be readers. No attention is given to finding out 
the kinds of strategies the students are using and teach- 
ing new, more successful strategies. 

Much of this emphasis on the relationship of 
sound-symbol is a response to a finding in an earlier 
study (WaUach, Wallach, Dozier, & Kaplan, 1977), which 
indicated that at the end of kindergarten, most of a sam- 
ple of disadvantaged students but few middle-class stu- 
dents had difficulty recognizing phonemes in words read 
to them, such as knowing that man but not house starts 
with the mmm sound. Wallach and Wallach argue that 
disadvantaged students need to be explicitly taught letter 
sounds so they can serve as a foundation for acquiring 
other skills necessary for learning to read. 

In total, Wallach and Wallach includes only the fol- 
lowing components of reading in their model: perceptu- 
al analysis, decoding, and oral language proficiency. 
They apparently believe that once the code has been 
cracked, oral language processes take over. 

Structure of tutoring 

The Wallach and Wallach program has three parts. 
For about 10 weeks, children are taught to recognize ini- 
tial phonemes in words read to them, to recognize let- 
ters, and to associate letters and phonemes. In the sec- 
ond stage, students spend 2 to 3 weeks learning to 
sound out and blend easy words. For the remainder of 
the year, the children learn :o apply their skills to class- 
room reading materials. Thus, the Wallach model begins 
as a completely separate tutoring program (like Reading 
Recovery) but later begins to integrate tutoring with 
classroom instruction (like Success for All). 

Two studies have evaluated the Wallach model. 
The results of these studies are summarized in Table 8. 
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Table 9 Effects of Programmed Tutorial Reading 



Effect sizes 



Programmed tutoring Directed tutoring 





15 min. 


30 min. 


15 min. 


30 min. 


Measures 


per day 


per day 


per day 


per day 


Elbon et ai. (1968) 










Ginn Total Vocabulary 


+.09 


+.57 


+.23 


-.07 


Ginn Total Comprehension 


+.13 


+.53 


+ 10 


-.21 


Ginn Total Word Analysis 


-.19 


+.46 


+.28 


-.01 


Stanford Achievement Test 


+.01 


+.18 


+.41 


-.17 


Ellson et ai. (1965) 










Total Ginn Score 


+.33 








Total Word Analysis Score 


+.36 








Word Recall Score 


+.78 








McCieary (1971): 










Ginn Achievement 










(all students) 


+.40 








Ginn Achievement 










(low achievers only) 


+.37 









The first evaluation was a field test in two inner- 
city Chicago schools (Wallach & Wallach, 1976). First 
graders who were identified at the beginning of the 
school year as low in "academic readiness" were ran- 
domly assigned to either tutoring or a no-treatment 
control. 

On the Spache Word Recognition Scale, die tutored 
students scored 5 months higher than the control (G.E. 
1.8 vs. 1.3) with an effect size of +.64. On the Spache 
Consonant Sounds Test, the tutored students also outper- 
formed the control group, with an effect size of +.66. On 
the Spache Reading Passage scales, there were apparent 
differences favoring the tutored students but these were 
obscured by a floor effect on the test (vhich does not 
measure below a grade equivalent of 1.6). 

A second study (Dorval, Walkch, & Wallach, 1978) 
evaluated the program in rural Roanoke Rapids, North 
Carolina. Students who received the tutoring were com- 
pared to similar students in the same school the previous 
year, to similar students in a comparison school who 
received the services of a full-time reading aide in their 
regular reading class, and to other students in the s«ime 
comparison school who received neither tutoring -ior 
aides. At the end of the year, students took the group- 
administered CTBS and were individually assessed on 
the Spache Word Recognition and Reading Passages 
scales. The various control groups did not differ from 
one another, so they can be pooled. 

On the Spache Word Recognition Scale, the tutored 
students scored 8 months higher than controls (grade 



equivalent 2.3 vs. 1.5). On the Spache Reading Passages, 
the tutored students were reading at a median grade 
equivalent of 1.8, while control students were at a medi- 
an of 1.6., but again a floor effect may account for this 
small difference. On the CTBS 7 tutored students scored at 
the 56th percentile, comparison students at the Mth, for 
an effect size of +.75. 

Like the other programs, the measures used in 
assessing the Wallach and Wallach program match the 
model of reading and instruction. The Spache Word 
Recognition Test and the Spache Consonant Sound Test 
correspond to the emphasis on perceptual analysis and 
decoding. The CTBS also has sections that require word 
identification, analysis, and comprehension. 

Again, there are no implementation data on the 
Wallach tutoring program. It would be important to 
know if the different effect sizes found in Wallach and 
Wallach (1976) compared to Dorval, Wallach, and 
Wallach (1978) are the result of implementation differ- 
ences or other program factors. The differences found 
on the SDache Word Recognition test and Spache 
Reading Passages in the two studies is large enough that 
differences in program implementation need to be 
considered. 

Programmed Tutorial Reading 

Programmed Tutorial Reading is a highly structured 
tutoring program used with first graders who are in the 
lowest quartile on standardized reading tests. The pro- 
gram was originally developed by Douglas Ellson at 
Indiana University. The tutors for the program are paid 
paraprofessionals, volunteers, or parents. Students are 
tutored 15 minutes per day as a supplement to regular 
classroom instruction. 

Model of reading 

Ellson and his colleagues describe reading as 44 a 
complex activity that, at minimum, includes oral or sight- 
reading, phonics, and comprehension* (Ellson, Barber, 
Engle, & Kampwerth, 1965, p. 80). They describe a hier- 
archy in the important components of teaching reading 
in which sight reading has priority, followed by a phone- 
mic analysis and synthesis, which they describe as not 
being necessary at the start, and, finally, comprehension 
of the meaning of visually presented words or sentences. 

Of all the programs described, Programmed 
Tutorial Reading is the least explicit regarding its model 
of reading. This is partly due to the fact that Ellson and 
his colleagues were primarily interested in testing the 
structure of this individualized programmed instruction 
and intended to extend this model to other content areas 
such as math. The oniy components of reading that 
appear to be part of this model are perceptual analysis 
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and decoding. They present no clear explanation or indi- 
cation of how comprehension is taught. 

The curriculum in Programmed Tutorial Reading is 
designed on the principles of operant conditioning. 
Students proceed by mastering small, sequential steps in 
the reading process and are reinforced for correct 
responses. The primary emphasis is on acquiring sight 
words. Phonics is also systematically presented in the 
context of acquiring words. There is no emphasis on 
reading words in connected text with the goal of learn- 
ing new words and learning to comprehend what is 
read. Mastering individual components is expected to 
build a repertoire of behaviors that coordinate into read- 
ing. How this occurs is not explicitly stated. 

Structure of tutoring 

Students are cycled through a sequence of lessons 
on sight-reading, comprehension, and word analysis, 
which is repeated many times. Tutors are trained in spe- 
cific strategies to present items, reinforce students for 
correct responses, and route students through the materi- 
als according to their responses. There is no coordina- 
tion between tutoring and classroom instruction. 

Results 

Several studies have evaluated Programmed 
Tutorial Reading, but only three of these have compared 
the program to control groups over meaningful time 
periods with nonretarded populations. Table 9 summa- 
rizes the results of these studies. 

Ellson, Harris, and Barber (1968) evaluated 
Programmed Tutorial Reading of two durations, over a 
full school year. Students were assigned to one of four 
tutored groups: Programmed Tutorial Reading for 15 
minutes per day, Programmed Tutorial Reading for 30 
minutes per day, an alternative tutoring program called 
Directed Tutoring for 15 minutes per day, and Directed 
Tutoring for 30 minutes per day. Then, a matched stu- 
dent was identified within the classroom of each tutored 
student. The students were first graders in 20 
Indianapolis schools. Most of the schools served low- 
income populations, but the students were selected to 
be representative of their schools and did not necessarily 
have reading problems. The Directed Tutoring program 
did not use the programmed materials or highly struc- 
tured procedures used in Programmed Tutorial Reading, 
but used remedial and supplementary materials more 
like those typically used in classrooms or in remedial 
reading programs. 

The results (see Table 9) indicate strong effects of 
the 30-minutc Programmed Tutorial Reading Program on 
tests provided along with students' Ginn basals (mean ES 
- +.52), but effects on the standardized Stanford 



Achievement Test were near zero, as were overall effects 
of the 15-rninute per day program. Small posiuve effects 
were found for the 15-rninute per day Directed Tutoring 
program, but (oddly) effects of the 30-minute Directed 
Tutoring treatment were slighdy negative. Another study, 
by Ells on, Barber, Engle, & Kampwerth (1965), com- 
pared 15 minutes per day of Programmed Tutorial 
Reading for a semester to an untreated control group. In 
this case, moderate posiuve effects were found on the 
three measures used. 

The largest methodologicaliy adequate study of 
Programmed Tutorial Reading was done by McCleary 
(1971) in Lenoir County, No.th Carolina. In this study, 
low-achieving first graders were matched and assigned 
to experimental or control groups. The experimental stu- 
dents were tutored for the entire school year for 15 min- 
utes per day. Positive effects on the Ginn reading test 
were found for the sample as a whole (ES • +.40) and 
for the poorest readers (ES - +.37). In addition, reten- 
tions in first grade were 55% lower in the tutored group 
than in tha nontutored group. Taken together, the evalu- 
ations of Programmed Tutorial Reading suggest that the 
program has positive effects on student reading achieve- 
ment, but the effects are smaller and less consistent than 
those for the programs that use certified teachers. 

Two issues need to be addressed regarding the 
Program Tutorial Reading project. The study done by 
Ellson and his colleagues found different results than the 
McCleary study. These differences could be due to the 
fact that McCleary had a better experimental design than 
Ellson et al. However, another explanation which should 
be considered is that the differences found were the 
result of differences in implementation of the program at 
different Mtes. This is only speculative since implementa- 
tion data are not available on either study. 

In Programmed Tutorial Reading, the assessment 
used to measure outcomes matches the model of read- 
ing in the program. The Ginn Total Word Analysis and 
Vocabulary tests assess word identification and decoding 
skills, two skills that are emphasized in this program. 
The Ginn Total Comprehension consists primarily of 
reading short sentences or passages and answering ques- 
tions. This section of the Ginn primarily assesses lower 
level comprehension skills that are taught specifically in 
this program. Unlike the comparison group, students in 
the Programmed Tutorial Reading group were taught 
tasks in tutoring which were similar to those used hi 
evaluating the program. 

Discussion 

One-to-one tutoring of low-achieving primary- 
grade students shows potential as an effective instruc- 
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tional innovation. Across 16 separate studies of cohorts 
involving *ive different tutoring methods, effect sizes 
were substantially positive in nearly every case. 

The five tutoring programs discussed here vary 
enormously in models of reading, curriculum, tutoring 
methods, duration, integration with regular classroom 
instruction, and other characteristics. The studies are 
equally diverse in populations, measures, and proce- 
dures. However, some patterns can be perceived. 

First, programs with the most comprehensive mod- 
els of reading, and therefore the most complete instruc- 
tional interventions, appear to have larger impacts than 
programs that address only a few components of the 
reading process. Reading Recovery and Success for All 
include in reading instruction several components of 
reading such as perceptual analysis, conventions of 
print, error correction strategies, decoding, comprehen- 
sion, error detection, and reading strategies. Moreover, 
they have comprehensive approaches to teaching the 
complex process of reading. In contrast, the Prevention 
of Learning Disabilities program which focuses only on 
building specific skills related to the reading process pro- 
duced less consistent comprehension outcomes. 

Second, it is not enough that programs simply use 
tutors. The content of the reading program in addition to 
the form of instructional delivery may be important vari- 
ables. Ellson et al. (1968), for example, found the 
Programmed Tutorial Reading model to be significantly 
more effective than a standard "directed tutoring" inter- 
vention, and Arnold et al. (1977) found the Prevention 
of Learning Disabilities (TEACH) program to be con- 
siderably more effective than "regular tutoring." 
Mantzicopoulos et al. (1990) failed to replicate the find- 
ings of the earlier studies of Prevention of Learning 
Disabilities, but similarly found few effects of a "stan- 
dard" phonics-based tutoring approach. An Ohio 
statewide study of Reading Recovery failed to find any 
positive effects of two alternative models of one-to-one 
tutoring (Pinnell et al., 1991). These findings, plus the 
apparent advantage of tutoring by certified teachers over 
tutoring by paraprofessionals, provides support for the 
proposition that for tutoring to be maximally effective it 
must improve the quality of instruction, not only 
increase the amount of time, incentive value, and appro- 
priateness to students' needs (see Wasik & Slavin, 1990). 

Third, programs using certified teachers as tutors 
appeared to obtain substantially larger impacts than 
those using paraprofessionals. Effect sizes for 
Programmed Tutorial Reading and the Wallach Tutorial 
Program generally fell in the range of +.20 to +.75, while 
those for the programs using certified teachers produced 
average effects from +.55 to +2.37 by the end of first 
grade. The teacher-delivered and para professional- 



delivered models also differed in curriculum. Both the 
Wallach model and Programmed Tutorial Reading use 
highly structured, clearly described instructional materi- 
als, which in the latter program were explicitly patterned 
on programmed instructional methods usually designed 
for self-instruction. In contrast, the three teacher- 
administered models rely on teachers' judgment, flex- 
ibility, and knowledge of how children learn. 

Only one program, Success for All, is designed to 
integrate completely with regular classroom instruction, 
and this program also produced some of the largest 
effect sizes. Although coordinating the tutoring sessions 
with classroom instruction is sensible in theory, empiri- 
cal data need to be collected to determine its impor- 
tance. The type of classroom instruction with which the 
tutoring was coordinated would also be an important 
factor. In addition, lack of consistency between how 
reading is presented in the classroom and how it is pre- 
sented in tutoring may present a mismatch in the way 
reading is taught and result in confusion for the children. 
However, if Reading Recovery and Programmed Tutorial 
Reading were used both in the classroom and in tutor- 
ing, Reading Recovery might still have greater effects 
because its model of reading and delivery of instruction 
may be more effective. All of this remains to be deter- 
mined in additional studies. 

Several studies evaluated the cumulative and last- 
ing effects of one-to-one tutoring in the early grades. 
Studies of two Success for All schools (Slavin et al., 
1992) found that as students continued into second and 
third grades, initial positive effects continued to grow. 
Similar cumulative effects were found for Prevention of 
Learning Disabilities in two studies (Silver & Hagin, 
1979; Arnold et. al., 1977) but not in a third 
(Mantzicopoulos et al., 1990). Silver & Hagin (1979) also 
found that students who experienced Prevention of 
Learning Disabilities for a full year learned more than 
those who had it for a semester, and Ellson et al. (1968) 
found that gains were greater when students received 30 
minutes per day of Programmed Tutorial Reading than 
when they received only 15 minutes. 

Because one-to-one tutoring (especially by a certi- 
fied teacher) is expensive, the lasting effects of this 
approach are of great importance. Reading Recovery has 
been evaluated for lasting effects, and the results are 
positive but complex. On one hand, the raw score gains 
that students made on Text Reading Level In first grade 
have maintained through the end of third grade in two 
different cohorts (DeFord et al., 1988; Pinnell, 1988). On 
the other hand, because standard deviations of this mea- 
sure increase each year, effect size estimates have dimin- 
ished each year for both cohorts. A 1-year follow-up of 
Prevention of Learning Disabilities showed consistently 
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positive effects for the third graders for most measures 
with the exception of performance on the Gates- 
MacGinitie Comprehension test. The effects for reading 
comprehension decreased 1 year after the intervention. 

Two of the tutorial programs, Success for All 
(Slavin et al M 1992) and Programmed Tutorial Reading 
(McCleary, 1971) documented substantial reductions in 
retentions as a result of first-grade tutoring, and Success 
for All (Slavin et al., 1992) also showed reductions in 
special education referrals. 

Is tutoring cost effective? 

It should not come as a surprise that one-to-one 
tutoring of primary grade students is effective. A more 
important question is whether it is effective enough to 
justify its considerable cost. One way to address this 
question is to compare tutoring to other expensive inter- 
ventions. For example, experiments in Tennessee, New 
York City, Toronto, and Indiana have reduced class size 
by almost half. This is the same as hiring an additional 
teacher for each class, who could instead be used to 
provide one-to-one tutoring for 20 minutes per day to 
about 15 students. The best and most successful of these 
class-size experiments, a Tennessee statewide study, 
found a cumulative effect of substantially reducing class 
size from kindergarten to third grade of about +.25 
(Word et al., 1990), less than that found in any of the 
tutoring models. A follow-up study 1 year later found 
lasting effects of 4 years of small classes to be only +.13 
(Word et al., 1990). Other studies of halving class size 
have found even smaller effects (Slavin, 1989). The 
effects of having aides work in the classroom have been 
found to be minimal in many studies (see Scheutz, 1980; 
Slavin, in press); the same aides could be used as tutors 
using models designed for that purpose, or could be 
replaced by teachers for a greater impact. 

On the other hand, it is not yet established that a 
heavy investment in first grade will pay off in permanent 
gains for at-risk students. The Reading Recovery and 
Prevention of Learning Disabilities results hold out some 
hope for lasting gains, and the cumulative effects of 
Success for All also show promise for maintaining initial 
gains. Reductions in retentions and special tducation 
referrals, seen in two of the tutoring models, have both 
immediate and long-term impacts on the costs of educa- 
tion for low achievers. Substantial savings due to 
reduced retentions and special education placements 
have been shown for Reading Recovery (Dyer, 1992) 
and for Success for All (Slavin et al., 1992). However, if 
first-grade tutoring models prove to have long-term 
effects either without additional intervention (as in 
Reading Recovery) or with low-cost continuing interven- 
tion (as in Success for All), cost effectiveness will not be 



the only criterion for deciding to use these models. For if 
we know that large numbers of students can be success- 
ful in reading the first time they are taught, and that the 
success not only lasts but also builds a basis for later 
success, then educators and legislators may perceive an 
obligation to do whatever it takes to see that all students 
do in fact receive that which is necessary for them to 
succeed. 

Future research 

In many ways, research on preventive tutoring 
models is in its infancy. Although the studies reviewed 
here clearly indicate a strong positive effect of well- 
designed tutoring models, there are many important 
issues to be understood. 

On the programmatic side, one important set of 
• questions concerns how much reading failure can be 
prevented using resources short of one-to-one instruc- 
tion by certified teachers. Could one-to-two or one-to- 
three instruction be nearly as effective? Could forms of 
tutoring using paraprofessionals be devised that would 
be nearly as effective as forms requiring certified teach- 
ers? Must tutoring be done daily, or could it be done less 
frequently? How much time must be allotted to tutoring 
each day? These issues need to be empirically tested. 

More work is clearly needed on long-term effects 
of tutoring, not only on achievement but also on special 
education referrals and need for long-term remediation, 
critical elements in any consideration of cost effective- 
ness. Also, studies of alternative approaches to tutoring 
are needed. Successful models range from the phone- 
mic, rigidly prescribed Programmed Tutorial Reading, to 
the ''learning to read by reading" emphases of Reading 
Recovery and Success for All, to the focus on specific 
perceptual deficits of Prevention of Learning Disabilities. 
In the studies, it may be that each of these types of 
approaches would be successful with different children, 
and that someday we may know which type of program 
will work best with children of a given profile. 

The issue of selection of assessment measures 
based on what is being taught in the programs has been 
discussed. Recently, researchers have been calling for 
more authentic measures for assessing what children 
leam. One possible way of trying to establish some 
understanding across programs would be to assess chil- 
dren in each program on the same measures, both stan- 
dardized tests and perhaps more importantly, ongoing 
literacy performance measures (see Taylor, 1990). This 
information would allow some cross-program compar- 
isons and also help in determining generalizability of 
what is taught to other forms of assessment 

A great deal of work is needed to understand why 
tutoring is effective. The rudimentary explanation offered 
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in this article must be replaced by a far more sophisticat- 
ed understanding of cognitive and motivational process- 
es activated in tutoring that are not activated to the same 
degree (at least for at-risk children) in the regular class- 
room. Understanding how at-risk children learn to read 
in tutoring would contribute to an understanding of how 
at-risk children learn in general; the tutoring setting pro- 
vides an ideal laboratory in which the process of learn- 
ing to read can be observed as it unfolds over time. 
Microanalysis of tutor/ child discourse could contribute to 
our understanding of how children learn to read (Green 
& Weade, 1985; Handerhan. 1990). 

This qualitative understanding of tutoring would 
also help address the important issue of implementation 
across tutoring sessions. Only Reading Recovery has 
attempted to assess implementation and the effect this 
has on outcome data. Understanding how instruction is 
delivered will also help in tutor training. Every tutor is 
different and brings to the tutoring session his or her 
own unique understanding of that child and reading. 
However, each program has specific prescribed theories 
of reading and how these theories translate into practice. 
The goal is to ensure that instruction is in concert with 
the model of reading and is consistent from one tutoring 
session to the next. 

As discussed in the introduction of this article, it 
would have been helpful to discuss the differences in 
each program s theory of reading. However, this was not 
possible because only Reading Recovery has made 
attempts at outlining a clear, coherent theory of reading. 
Instead, these programs take a pras>nwUc approach; that 
is. the evaluations focus on producing oaia to indicate 
that the programs work, not why they work. However, 
articulating a theory of reading based on empirical evi- 
dence is a valuable contribution to the field of reading. 
This area is ripe for theory development. It would be 
important to begin to understand how the interaction 
between the tutor and the student results in learning to 
read. Clarifying a theory of reading would add to a fun- 
damental understanding of why the components includ- 
ed in a particular program make the program effective. 

Finally, several limitations to this type of research 
synthesis need to be addressed. First, when only tutoring 
programs are reviewed, research on other effective inter- 
ventions for preventing reading failure is not addressed. 
Other class wide reading programs also have shown 
some effectiveness. Also, in a best evidence synthesis, 
programs are grouped together and examined in terms 
of effectiveness. However, each program has very dis- 
tinct characteristics and has been tested on different 
populations. Although we have attempted to look at 
some specific similarities and difference among pro- 
grams, a review of this kind does not examine the 



nuances of each program nor does it address the qualita- 
tive differences that exist in the tutor-child dyad. Also, to 
test the relative effectiveness of these programs, studies 
need to be conducted in which children are randomly 
assigned to alternative programs and a control group, 
and in which the children's success on a variety of mea- 
sures is assessed. 

Although we want to know much more about how 
tutoring works and how to maximize its effectiveness 
(and minimize its cost), it appears from the research 
reviewed in this article that one-to-one tutoring is a 
potentially effective means of preventing student reading 
failure. As such, preventive tutoring deserves an impor- 
tant place in discussions of reform in compensatory, 
remedial, and special education. If we know how to 
ensure that students will learn to read in the early 
grades, we have an ethical and perhaps legal responsi- 
bility to see that they do so. Preventive tutoring can be 
an alternative for providing a reliable means of abolish- 
ing illiteracy among young children who are at risk for 
school failure. 
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Responsive Practices in the Middle 
Grades: Teacher Teams, Advisory 
Groups, Remedial Instruction, and 
School Transition Programs 

DOUGLAS J. MAC IVER and JOYCE L. EPSTEIN 
Johns Hopkins University 

In this article, we analvze data obtained from "Education in the Middle 
Grades." a national survey of practices and trends using a representative 
sample of principals in public schools that contain grade 7, to examine 
the use and perceived effects of practices that are believed bv many 
educators to be especially responsive to the needs of earlv adolescents. 
These responsive practices include group advisory periods, interdisci- 
plinarv teacher teams, remedial instruction programs, and "school tran- 
sition" activities. Multiple regression analyses suggest that grade orga- 
nization is not a consistent determinant of responsive middle-grades 
practices. Overall, 7-9 junior high and 7- 12 combination schools have 
fewer responsive practices than other middle-grade organizations. There 
are educationally significant but modest relationships between a school's 
use of responsive practices and principals' perceptions of the outcomes 
obtained bv the school and its students. Different practices are associated 
with different indicators of school and student success. Principals report 
a stronger school program overall when they invest heavilv in interdis- 
ciplinarv teams of teachers to create supportive conditions for teachers 
and students. Principals expect fewer students to drop out before high 
school graduation when the school uses supportive advisory group activities 
or responsive remediation programs. Principals report that extensive 
school transition programs reduce the number of students who need to 
repeat the grade immediately following the transition. The implications 
of the results for the improvement of education in the middle grades 
are discussed. 

For manv vouth, earlv adolescence is one of the last real oppor- 
tunities to affect their educational and personal trajectory. The 
middle grade school, one of the key socializing institutions for 
voung adolescents, represents a critical "turning point" in the lives 
of American vouth, [Jackson and Hornbeck 1989. p. 831] 
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The most successful [middle grade] schools ... are those meeting 
the needs of earlv adolescents for security, support, and success 
in a proactive manner. [ Van Hoose and Strahan 1988, p. 26] 

Early adolescents are characterized by a plethora of simultaneous md 
often conflicting needs (Epstein 1988; Van Hoose and Strahan 1988). 
For example, they need the security and support of close, caring adult 
supervision and guidance at the same time that they need increasing 
autonomy from adults. They need and want attention and recognition 
for their own unique abilities, successes, and achievements, but they 
also want to be part of a crowd. As they engage in the life-shaping 
process of self-exploration and self-definition, they need help in rem- 
edying their weaknesses and developing their strengths, but— in order 
to be effective— this help must be offered in a wav that does not 
stigmatize them, label them, or separate them from their peers (Mac 
Iver and Epstein, in press). 

Since the turn of the century, when G. Stanlev Hall (1905) published 
the first major text on adolescence as a separate stage of development, 
adolescent psychology has been influencing the rationale, curricula, 
and pedagogy of middle-grades schools (Perlstein and Tobin 1988). 
For example, since the 1920s, some middle-grades schools have been 
implementing programs that thev believe are especially responsive to 
earlv adolescents, including such practices as exploratory courses (Koos 
1927; Smith 1925); homerooms and teacher advisories (Hieronimus 
1917), and extracurricular activities (Briggs 1922: Kitson 1926). In 
the fifties, sixties, and seventies, key middle-level educators began 
advocating additional developmental^' appropriate practices including 
(a ) core curriculum approaches emphasizing the correlation of subject 
areas, the integration of learning across disciplinary boundaries, and 
interdisciplinary team teaching, (b) discovery and inquiry methods. 
(c) flexible scheduling, and (d) ungraded programs (Alexander and 
George, 1981; Mac Iver and Epstein, in press). 

The middle-school movement— which now has almost three decades 
of experience behind it— is devoted to implementing responsive prac- 
tices in the middle grades. This movement has met with mixed success. 
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for Research on Effective Schooling for Disadvantaged Students (CDS). 
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Center on Families. Communities, Schools and Children's Learning 
and also codirector of the Middle Grades Program at CDS. 
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Although some middle-grades schools are far along the road to in- 
stitutionalizing many of the practices listed above, others have not 
begun the restructuring process or have consciously rejected certain 
recommended practices (Epstein and Mac Iver 1990). One reason for 
the great diversitv of educational practices and approaches currently 
found in middle-grades education is that there has been little useful 
research to help educators decide which practices are beneficial for 
early adolescent students and which are ineffective. The research that 
has been done has been limited in the location and nature of the 
samples of schools and students, the breadth and depth of information 
on middle-grades practices, or the comparisons of alternative orga- 
nizations of middle-grades schools. 

Out of the many responsive practices appearing on past or current 
lists of recommendations for education in the middle grades, this 
article focuses on {a) teacher advisory, homeroom, or group advisorv 
programs; ib) interdisciplinary teams of teachers who share the same 
students and coordinate their instructional programs across subjects; 
(c) special remedial activities for students who fall behind or learn 
more slowlv than other students; and (d) transition or articulation 
activities with students, parents, and school staff to ease students' 
transitions from one level of schooling to the next (i.e., from the 
elementarv to the middle grades, and from the middle to the high 
school grades). 

Few data exist concerning the prevalence or effects of such programs. 
This studv examines the structure, use, and perceived effects of these 
four different tvpes of responsive practices in a national sample of 
public schools that serve young adolescents. It considers differences 
in the use of these practices in schools with different grade spans, in 
different locations, and with different types of student populations. 
Further, the studv uses principals' opinions, estimates, and best guesses 
to begin to address the question. Are the practices that are being 
implemented having positive effects on the strength of middle-grades 
programs and student outcomes? That is, it examines how these practices 
are related to principals' evaluations of their middle-grades program, 
to principals' predictions of the percent of current seventh graders 
who will not graduate from high school, and to other school-level 
outcomes such as retention rate. 

Method 

The 2,400 schools in the studv are a probability sample of public 
schools in the United States having seventh-grade students. From the 
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approximately 25,000 public schools that serve regular seventh-grade 
students, 2,000 schools were sampled with probabilities proportional 
to each school's enrollment per grade level. In addition, two subuniverses 
of schools were oversampled: schools serving both elementary and 
middle grades in metropolitan areas and schools in districts with sub- 
stantial rates of poverty (i.e.. Orshanskv index at or above 25). Ap- 
proximately 200 of each type were added to the sample, making the 
total sample sizi? 2,400. 

' In the spring of i988, the Johns Hopkins Center for Research on 
Elementary and Middle Schools (CREMS) sent survey forms bv mail 
to the principals of the 2,400 schools in the sample. A total of 1,753 
(73 percent) of the principals provided information on their school 
for this studv, including 1,344 who returned surveys bv mail and 409 
who completed shorter telephone interviews. The telephone interviews 
were conducted with a random subsample of all nonrespondents to 
the mail survev. Weighting the telephone interview responses to account 
for the essentially similar nonresponding schools that were not followed 
up bv telephone brings the weighted response rate to 93 percent for 
the items that were common to the mail and telephone survevs. 

For data analysis purposes, each school was first assigned a "weight" 
that was the inverse of its probability of selection. This weighting 
returns the sample to an equal probability (representative) sample of 
schools. Then, because we wanted to describe the experience of the 
tvpical middle-grades student, each school was upweighted bv the 
school's enrollment per grade level, scaled so that the weighted total 
number of schools is equal to the unweighted raw number of schools 
(1,344 for items asked bv mail only and 1,753 for items asked over 
the phone and bv mail). 

Multiple regression analyses were used to identify significant an- 
tecedents and consequences of four sets ot practices: group advisory 
periods, interdisciplinary teams, remedial instruction activities, and 
school transition programs. The variables used in these analyses are 
presented in the Appendix and are described in more detail later. 
The variables included measures of (a) practices, programs, policies, 
and staff in the middle grades (Appendix, variables I — XI); (/;) char- 
acteristics of the school (variables XII-XIV); (c) characteristics of the 
school's students (variables XV-XVIII); and (d) outcomes obtained 
by the school and its students (variables XIX-XXIV). Throughout 
the questionnaire, principals were reminded to focus onlv on their 
school's practices and programs for middle-grades students ("students 
in grades 5 through 9") even if the school also contained students 
from other grade levels. The complete questionnaire is found in Epstein 
and Mac Iver (1990). 



American journal of Education 




IFFICEO/ 

Research 



1U4 



III -26 



Volume 1, No. 7, Summer 1993 




Mac Iver and Epstein 

(hade-span categories. —Oracle span was one school characteristic that 
principals reported (variable XII). The data indicate that public schools 
in the United States that enroll seventh-grade students include 29 
different grade spans. Schools were categorized in six groups for anal- 
vses: (1) k-S (schools that start with an elementary grade— usuallv 
kindergarten— and end with a middle grade— usually eighth); (2) 
K- 12 (schools that start with an elementary grade and continue through 
twelfth grade); (3) 7-12 (schools that start with a middle grade and 
continue through twelfth grade); (4) middle schools (mainlv 6-8 schools, 
but also 5-8, 5-7, and 6-7); (5) 7-8 schools; and (6) junior high 
schools (schools that start with a middle grade and continue through 
ninth grade). These categories were represented in the analvses bv 
live dummv variables; middle schools served as the control category 
(the category coded zero on each dummy variable). The dummv variables 
were used to examine the degree to which the grade span of a school 
predicts implementation of group advisory periods, interdisciplinary 
team approaches, school transition programs, or innovative remedial 
instruction activities in the middle grades. 

Xattanal patterns of grade organization. — More than half of the schools 
fall in the K-8 category or the middle school category (see table 1). 
Less common are 7-12. 7-8, K-12, and junior high schools. The 
number of schools in each category should not be confused with the 
number ot students who attend the schools in that category. For example, 
although onlv about 45 percent of the school buildings that contain 
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grade 7 are middle, 7-8, or junior high schools, these schools are 
attended by over 80 percent of all seventh graders. In contrast, the 
most common type of school building, the K~8 school, enrolls only 
9 percent of the nation's seventh graders. 

Group Advisory Periods 

In their attempt to offer early adolescents high-quality instruction 
from subject-matter experts, many schools establish departmentalized 
programs in which students receive instruction from a different teacher 
for each academic subject. However, when students change teachers 
every period, thev may feel that there is no one teacher who reallv 
knows them, cares about them, or is available to help them with problems. 
To provide each student with a teacher who knows and cares about 
the student and is available as a mentor or advisor, manv schools have 
established homeroom or group-advisory periods. About two-thirds 
of the schools in the United States that include grade 7 have one 
homeroom or group-advisory period, and 9 percent have two such 
periods (Epstein and Mac Iver 1990). 

Although advisory or homeroom periods are common, manv of the 
activities that occur during these periods are mechanical tasks (e.g., 
taking attendance, distributing notices, making announcements, ori- 
enting students to rules and programs) rather than social and academic 
support activities that use teachers' talents as advisors and that help 
students feel that someone is looking out for their interests and needs, 
To explore the antecedents and consequences of using supportive 
activities during group-advisory periods, a composite was created in- 
dicating the mean frequency of occurrence of nine social or emotional 
support activities during a homeroom or group advisory period (see 
Appendix, variable I). These activities included meeting with individual 
students about problems, giving career information and guidance, 
discussing academic problems or issues, and similar activities. Principals 
indicated how frequently each activity occurred, using a five-point 
scale, ranging from 1 = never to 5 = daily. Schools not having a 
homeroom or group-advisory period were assigned a score of one on 
this variable to indicate that support activities never occurred during 
a group advisory period at these schools. The grand mean for this 
variable was 2.3 (SD = 1.2); each type of support activity occurred 
onlv a few times per year, on the average. 

Antecedents and perceived consequences of using supportive activities during 
group advisory.— The first column in table 2 summarizes a multiple 
regression model in which the mean frequency of responsive activities 
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during advisory period is predicted based on (a) grade organization 
of school, (b) region, (r) the urbanicity of the area in which the school 
is located, (d) the percentage of black students in the school, (e) the 
percentage of the school's families whose income is below the povertv 
line, ( / ) the percentage of professional or managerial families in the 
school, and (g) the average ability of the students on entry to the 
school (Appendix, variables XII-XVIII). 

Effects of grade organization.— The standardized regression coefficient 
of -.12 for junior high schools in the first column of table 2 indicates 
that these 7-9 schools use supportive group-advisorv activities sig- 
nificantly less frequently than do 6-8 middle schools (the schools that 
served as the control category). Other grade organizations do not 
significantly differ from 6-8 middle schools in their use of supportive 
group-advisorv activities. Other comparisons (not shown in table 2) 
indicate that junior high schools use supportive group-advisorv activities 
significantly less often than every grade organization except for 7- 12 
schools. However, ]unior high and 7-12 schools are more likelv than 
other schools to have rit least one professional guidance counselor 
(Epstein and Mac Ivcr 1990) and thus mav be less likely than others 
to perceive a need for a group-advisorv program. Further, junior high 
and 7-12 schools are more likely than most other schools to have a 
large proportion ot teachers who have secondary subject-matter cer- 
tification (Epstein and Mac Iver 1990). Teachers who are secondary 
certified mav feel poorly prepared to serve as teacher advisors. Most 
of their education focused on helping them become experts in their 
areas of specialization. Typically, thev will have received less preparation 
than elementary-certified teachers in understanding and responding 
to students' nonacademic problems, interests, and concerns. 

To test the hypothesis that the lower use of supportive group-advisorv 
activities in junior high and 7-12 schools is due to the presence of 
professional guidance counselors and secondary-certified teachers, the 
regression analysis in the first column of table 2 was recalculated after 
adding "presence of guidance counselor' and "percentage of secondary - 
certified teachers" (Appendix, variables II and III) as predictors. As 
expected, schools with guidance counselors were less likelv to use 
supportive group-advisorv activities (P = -.07. P = .03). Similarly, 
the negative effect of having secondarv-certified teachers on use of 
supportive group-advisorv activities was nearly significant (P = -.06. 
p is <)6). Nevertheless, the differences in use of supportive group- 
advisorv activities between junior high or 7-12 schools and middle, 
7-8, and K-8 schools did not lessen even after controlling for these 
two variables. Conversely. K-12 schools were no longer significantly 
different from junior high or 7-12 schools in use of supportive activities. 
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The finding that 7-9 junior high. 7-12, and now K-12 schools 
use supportive group-advisorv activities less (even after taking into 
account these guidance counselor and certification effects) suggests 
that the inclusion of one or more ot the high school grades in a school 
mav make it less likely that the school will establish a strong group- 
advisorv program tor middle graders. Carnegie L'nit requirements 
concerning course offerings (which begin in ninth grade) mav limit 
the number and length of periods available for group-advisorv activities 
in the high school vears. Although there is nothing to prevent junior 
high or 7- 12 schools from offering frequent group-advisorv activities 
to their seventh and eighth graders (even it thev cannot otter them 
to their ninth graders), many schools choose not to differentiate their 
program in this wav. 

Other antecedents, — There were regional differences in the use of 
supportive group-advisorv activities. The - .10 coefficient for the Mid- 
west in the first column of table 2 indicates that suc h activities occurred 
signilicantU less frequently in the Midwest than in the Northeast. In 
contrast, the West and the South did not significantly differ from the 
Northeast in use of these ac tivities. Supportive group-advisorv activities 
were used less in the Midwest than in anv other region. 

Finallv. the frequency of support i\e group-advisorv activities increases 
as the percentage of black students in the school increases, as the 
percentage of families below the poverty line increases, and as the 
population of the schools' standard metropolitan statistical area (SMSA) 
increases. That is. schools with poor, ptedominantlv black student 
populations in big cities are more likelv than others to establish group- 
advisorv periods that frequently provide social and emotional support 
to students. 

Effect of supportive activities on perceived strength of guidance frrogrmn.\. — 
Next, we examined the possible consequences of providing supportive 
activities during homeroom or advisory periods. In schools where 
these activities often occur, are principals more likelv to report that 
the school is meeting students* needs for guidance, advice, and coun- 
seling? Or do such activities make no discernible difference? 

Principals rated the overall quality ot their guidance and advisory 
program (Appendix, variable XIX) on a scale ranging from a high of 
4 (signifying an excellent guidance program, in which present practices 
meet students' needs exactly) to a low of 1 (signifying a weak guidance 
program). The .lb coefficient in ihe second column (last row) of table 
2 indicates that principals in schools with an advisory/homeroom pro- 
gram that features frequent use of supportive activities were significant^ 
more likelv than principals in other schools to rate their guidance 
program as strong. 
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The grade-organization effects in the second column show that 
principals in K.-8 and K- 12 schools rated their overall guidance pro- 
gram as significantly weaker than did principals in middle schools. 
Finallv, schools serving a high percentage of professional/managerial 
families rated their guidance programs as being stronger than did 
principals in other schools. 

Effect of supportive activities on estimated dropout rates. — We asked prin- 
cipals to estimate the percentage of their current seventh graders who 
probably would not gradu;:.e from high school (Appendix, variable 
XX). One possible outcome of a strong homeroom/ advisory program 
would be to reduce a schools* dropout rate below (he rate one would 
otherwise predict based on the schools* grade organization, location, 
and type of student population. The final two analyses in columns 3 
and 4 of table 2 indicate that principals in schools with more supportive 
homeroom/advisorv activities do, indeed, report a significantly lower 
expected dropout rate for both bovs and girls. These analyses also 
indicate that the expected dropout rate is higher in 6-8 middle schools 
than in K-8, K-12, and 7-12 schools, and higher m the West and 
South than in the Northeast. In addition, principals expect more students 
to drop out if their school is located in a big citv. if their community 
contains many students living below the poverty line, and if their 
school serves manv low-abilitv children or few children from professional 
families. 

In sum, even after family and student background variables, region, 
and grade organization are statistically controlled, principals in schools 
with well-implemented group-advisorv programs report that thev have 
stronger overall guidance services and lower expected dropout rates. 
Although principals estimates of the strength of their guidance services 
and of their future dropout rates are informative, thev are imperfectly 
related to objective measures of guidance- prog ram effectiveness and 
to actual dropout rates. Thus, it is important for future research to 
attempt to replicate these findings with more objective measures (e.g.. 
once longitudinal data from the National Education Longitudinal Study 
sample [Haffner et al. 1990 J are available, it will be possible for re- 
searchers to compare actual dropout rates for high school students 
who received or did not receive supportive group-advisorv services 
when thev were in the middle grades). 



Interdisciplinary Teams 



Manv proponents of the middle-school philosophy view the establish- 
ment of interdisciplinary teams of teachers as the kevstone of education 
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in the in iddle grades (e.g., Merenbloom 1986; Vars 1987). Thev hy- 
pothesize that interdisciplinary teams will eliminate the isolation that 
main teachers feel bv providing a working group of colleagues to 
conduct activities and discuss and solve mutual problems; ihat instruction 
will be more effective in schools that use interdisciplinary teaming 
because of increased integration and coordination across subjects; and 
that teachers on a team sharing the same group of students will be 
able to respond morequicklv, personally, and consistently to the needs 
of individual students. 

Our data indicate that about 42 percent of early adolescent students 
receive instruction from interdisciplinary teams of teachers sometime 
between grades 5 and 9. An interdisciplinary team most often consists 
of four teachers— a social studies teacher, an English teacher, a math 
teacher, and a science teacher— who share a group of 100-125 students 
(Epstein and Mac Iver 1990). 

Implementation of an Interdisciplinary Team Approach 

Schools van in their level of implementation of interdisciplinary teacher 
teams. For example, in some schools, all students receive instruction 
from interdisciplinary teams of teachers, and team members are given 
a common planning period. In other schools, the team approach (if 
adopted at all) mav be used with onlv a subset of the school's students 
(e.g.. sixth-graders) and team members mav not be given a common 
planning period. To measure the variation between schools in their 
emphasis on an interdisciplinary team approach to school organization 
and instruction, we created a composite variable (Appendix, variable 
IV) ranging from 0 (no use of interdisciplinary teams in the middle 
grades) to :1 (interdisciplinary teams and common planning periods 
at each of the middle grades in the school). 

Implementation of a Departmental Approach 

Some schools mav choose to establish and emphasize departments 
instead of. or in addition to. interdisciplinary teams. These schools 
mav organize their faculty bv subject area, appoint department heads, 
give common planning periods to members of departments, and use 
disciplinary (single-subject) team teaching. A disciplinary organization 
and emphasis may be particularly welcomed by those teachers who 
prefer to identify with a department rather than an interdisciplinary 
team and who find it easier to collaborate with and learn from teachers 
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who are in the same discipline. A school's commitment to departments 
was measured bv a composite variable ranging from 0 to 3 (Appendix, 
variable V). A maximum score of 3 indicates that the school has de - 
partments organized with their own chairpersons or heads, has a com- 
mon planning period for members of departments, and uses single- 
subject team teaching in each of the middle grades. 

Antecedents of Interdisciplinary Teaming 

The first column of table 3 summarizes a regression model that explores 
the antecedents of emphasizing an interdisciplinary team approach in 
the middle grades. The standardized coefficients indicating the effects 
of grade organization show that middle schools implement interdis- 
ciplinary teaming significantly more extensively than do other schools. 

The coefficients associated with region (col. 1. rows 6-8) indicate 
that schools in the Northeast are more likely than schools in other 
regions to have adopted an interdisciplinary team emphasis. 

It is interesting that schools that emphasize departments (with de- 
partment heads, common planning periods for departments, and teacher 
teams within departments) are more likely than other schools to also 
organize and emphasize interdisciplinary teams (col. 1, row 14). This 
indicates that a departmental emphasis and an interdisciplinary team 
emphasis coexist in many schools. 

Perceived Consequences of Implementing Interdisciplinary Team 
and Department Approaches in the Middle Grades 

The second column in table 3 reports standardized regression coefficients 
from an equation predicting the strength of each schools overall middle- 
grades program (Appendix, variable XXI). based on its emphasis on 
interdisciplinary teaming, commitment to departments, and other 
variables. The significant positive coefficients in rows 14 and 15 suggest 
that a school's commitment to departments and its implementation of 
interdisciplinary teacher teams are both associated with increases in 
the strength of its overall program, according to principals' reports. 

One other effect was significant. The higher the ave.age ability of 
students in a school, the stronger the ratings given bv the principal 
to the school's middle-grades program (row 12). 

There is no evidence that the adoption of an interdisciplinary team 
approach or a commitment to departments reduces dropout rates. On 
the contrary, schools that emphasize interdisciplinary teaming have a 
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higher expected dropout rate than would be predicted based on back- 
ground and demographic variables (cols. 3 and 4, row 15). Further, 
a school's level of commitment to departments is not a significant 
predictor of expected dropout rate (row 14). 

Whv is the relation between a school's emphasis on interdisciplinary 
teaming and the principal's reports of expected dropout rate positive? 
It mav be that a school's dropout rate influences the school's openness 
to making a commitment to interdisciplinary teaming. Schools that 
have a historic pattern of high dropout rates may make stronger com- 
mitments to this and other promising practices in the hope of reducing 
these rates. At the time of the survey, principals in these schools may 
not vet have known whether using interdisciplinary teams of teachers 
would actuallv reduce the percentage of their students who would 
leave school before high school graduation. 

An alternative explanation is that a focus on interdisciplinary teaming 
mav divert schools from providing sufficient remedial and guidance 
services, which mav be critical in dropout prevention. The data do 
not support this alternative hypothesis. Schools with a commitment 
to interdisciplinary teaming actually have more extensive remedial 
programs (r = . 1 2, P < .00 1 ); provide more supportive group advisory 
activities (r = .29, P < .001); and have lower students-per-guidance- 
counselor ratios (r = -.07, P < .05). These correlations suggest that 
the original hypothesis mav be correct. That is, schools with high 
dropout Tates mav often adopt interdisciplinarv teaming and other 
responsive practices in their attempt to rescue potential dropouts. 



Does Increased Common Planning Time and the Establishment 
of Team Leaders Help Teams Succeed? 

One might assume that having sufficient common planning time to 
do collaborative work and having a team leader — someone who is 
directly responsible for coordinating and organizing team activities — 
would help an interdisciplinary team succeed. Yet, only 36 percent of 
the schools that use interdisciplinary teams give team members two 
or more hours of common planning time each week and less than 60 
percent of all teams have a f ormal team leader (an elected or appointed 
leader, or a svstem in which team leadership rotates among members). 

The correlations in table 4 suggest that the provision of adequate 
planning time and the establishment of team leaders make a real 
difference in principals' opinions of how a team functions and in what 
it accomplishes. For example, the amount of common planning time 
allocated to interdisciplinary teams (Appendix, variable VI) is strongly 
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TABLE 4 



Zero-Order Correlations among Four Tea, rig-related Variables 





A 


B 


c; d 


A. Amount of common planning 








time 








B. Have team leader? (0 = no, 








1 = ves) 


.31 






C. Proportion of common planning 








time spent on team activities: 








coordinate content, revise 




.39 




schedules, regroup students, etc. 


.42 




D. Benefits resulting from use of 








interdisciplinary teams. 


.21 


.18 


.36 



Sou:. — All correlations are statistically significant: F « .001. 



associated with principals' estimates of the proportion of time that 
team members spend coordinating their activities (deciding common 
themes and related topics for instruction, altering schedules, regrouping 
students, discussing problems of specific students and arranging help, 
and so on [Appendix, variable VII, b-g]). Larger amounts of common 
planning time are also associated with obtaining greater benefits from 
interdisciplinary teaming according to principals' reports (Appendix, 
variable XXII), Similarly, as shown in column B of table 4, when 
interdisciplinary teams have formal leaders (Appendix, variable VIII), 
teams spend more of their common time engaged in team activities 
and produce greater benefits for their school 

Remedial Instruction Activities 



All middle-grades schools have some students who fall behind or learn 
more slowly than others. The Carnegie Task Force on the Education 
of Young Adolescents (1989) recommends that all middle-level schools 
proactivelv address the needs of these students through remedial in- 
struction activities that provide specialized instruction, extra coaching, 
and additional time to learn. We asked principals to report the remedial 
activities offered in their schools (Appendix, variable IX). Over 98 
percent of the principals reported at least one program to help students 
who fell behind. The most common remedial activities were pull-out 
programs in reading or English (61 percent of the seventh graders 
attended a school offering such a program), after- or before-school 
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coaching classes (58 percent), summer school (52 percent), and pull- 
out programs in math (51 percent). Schools were less likelv to offer 
students an extra subject period in lieu of an elective or exploratory 
course (28 percent), and rarelv offered remediation through Saturday 
classes (3 percent). Ironically, except for summer school, each of the 
special remedial activities listed was most common in schools where 
the average academic abilitv of students is considerably above the 
national norm (see table 5). 

Antecedents of (he Number of Remedial Programs Offered 

The first column in table 6 reports standardized coefficients from a 
regression model that attempted to predict the number of remedial 
programs offered in each school. The adjusted R ~ of .03 for this model 
indicates that very little of the between-school variance in the exten- 
siveness of remedial programs is explained bv grade organization, 
region, and familv and student background variables. Onlv three effects 
were significant. The number of remedial programs offered bv a school 
is positively related to the average abilitv level of the school's students 
and the urbanicitv of the area surrounding the school (rows 12 and 
13). Also, middle schools offer significantly more remedial programs 
than do 7-12 schools (row 3). 

Perceived Consequences of the Number of Remedial 
Programs Offered 

Ideallv. an extensive remedial instruction program should make it 
possible for a school to lower its retention rate (Appendix, variable 
XXIV). Our data suggest that, instead of serving as an alternative to 
retention, an extensive remedial program tends to go along with high 
rates of retention (see table 6. col. 2. row 16). Just as we saw with 
other indicators of responsive programs, schools with severe problems 
(e.g., a high number of flunking students) put in place manv practices 
(e.g., extensive remedial programs) that thev hope will alleviate the 
problems eventually. But at the time of the survey, principals saw no 
evidence that extensive remedial programs were making it possible 
for more students to earn promotion. 

There are several other significant predictors of a school's retention 
rate in the middle grades. The grade-organization effects (col. 2. rows 
1-5) indicate that the retention rate in middle schools is significantly 
lower than that found in 7-12, junior high, and 7-8 schools but is 
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significantly higher than that found in K-8 schools. Retention rates 
are highest in the South and lowest in the West (rows 6-8). As might 
be expected, retention rates are higher in schools that serve many 
minoritv students, families living in poverty, and low-abilitv students 
(rows 9. 10. and 12). Finallv, school policies concerning Lhe number 
of courses students can fail and still be promoted also affect retention 
rates (e.g., schools that allow students to be promoted even if they ail 
three or more courses have lower retenuon rates than do other schools). 

The sheer number of remedial programs offered does not affect 
principals' predictions of the percent of their current seventh graders 
who will not graduate from high school (cols. 3 and 4. row 16). There 
is. however, a significant positive effect of the average retention rate 
on estimated dropout rates (row 17). This finding is congruent with 
evidence from previous studies which suggests that holding students 
back -increases rather than decreases their risk of dropping out ot 
school" (Grissom and Shepard 1989. p. 34). 

The Extra-Subject-Period Approach to Remediation 

Of the remedial practices included on the survev instrument, the practice 
of providing students who need extra help with an extra subject period 
during the school dav (e.g.. instead of an elective or exploratory course) 
seems especially promising. Remedial activities that occur outs.de ot 
the regular school dav-after-school or before-school coaching sessions. 
Saturday classes, or summer school-are often not well attended by 
the students who need the most extra help to master basic skills and 
pass courses. Including the "coaching class'' as part of a low achiever s 
regular school dav guarantees that more of the students who need 
help will actually receive it. Likewise, remedial programs using the 
extra-subject-period approach mav be preferable to pull-out programs 
because students do not miss part of their other academic instruction 
(e g a student is not pulled out of social studies or science to receive 
extra help in reading or math), and being pulled out of class to receive 
help is a highly visible public event that increases the labeling and 
stigmatizing of low achievers. In contrast, fewer classmates mav know 
or care that low achievers are receiving extra academic instruction 
during activity period rather than attending one of the other available 
electives. activities, or minicourses. 

In manv schools, students have two or more periods for elective 
subjects, so students who receive extra help with basic skills during 
one period are not excluded from exploring new subjects. Analyses 
reported elsewhere (Mac Iver and Epstein 1990. p. 24) indicate that 
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schools that use the extra-subject-period approach to remediation have 
significantly more extensive exploratory programs than do other schools. 
That is. in these schools, even though some students devote some of 
their elective time to catching up, it is still the case that substantially 
greater proportions of students receive the opportunity to explore 
traditional electives (e.g., foreign language and home economics) and 
innovative minicourses in a variety of topics. 

Regression analyses reveal that principals in schools that use the 
extra-subject-period approach to remediation do indeed report slighdv 
lower expected dropout rates for both boys and girls (after controlling 
for all the variables in table 6 that are significant predictors of dropout 
rate). In schools that use this approach, the principal's estimates of 
the percentage of girls who will drop out is 1.4 percent below the rate 
that would otherwise be expected (0 = -.06, P = .02). For boys, use 
of the extra-subject-period approach is associated with a lessening of 
the estimated dropout rate bv 1.3 percent ((3 = -.05, P = ,04). None 
of the other remedial practices in the questionnaire is significantly 
associated with principals' predictions concerning dropout rates. These 
approaches need further studv to determine if and how thev help 
students succeed. 



Easing the Transition to a Middle Grades School 



More than 88 percent of the public school students in the United 
States enter a new school as thev make the transition to the middle 
grades (Epstein and Mac Iver 1990). There has been considerable 
concern about the negative effects that such school transitions can 
have on earlv adolescents (e.g., Blvth et al. 1983; Eccles and Midglev 
1989: Eccles et al. 1984; Simmons and Blvth 1987). In response to 
this concern, many middle-grades schools have developed school tran- 
sition programs (Epstein and Mac Iver 1990) and the National Middle 
School Association has begun officially recommending the use of such 
programs ("Resolutions" 1990). 

Principals described the activities used with students, parents, and 
staff in their schools to ease the transition of students to the middle 
grades (see Appendix, variable XI). The three most common activities 
(used bv over 40 percent of the principals) were having elementary 
school students visit the middle-grades school, having middle-grades 
and elementary administrators meet together on articulation and pro- 
grams, and having middle-grades counselors meet with elementary 
counselors or staff. 
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Some potentially promising activities were infrequently used, perhaps 
because thev are more difficult to implement, For example, only 20 
percent or fewer of the principals indicated use of the following practices: 
having elementary school students attend regular classes at the middle- 
grades school, having summer meetings at the middle-grades school, 
and having a buddv program that pairs new students with older ones 
on entry to the school (Epstein and Mac Iver 1990). 

Which tvpes of middle-grades schools have the most extensive ar- 
ticulation and transition activities in preparing students for entry into 
their school? For these analyses, the measure of the extensiveness of 
the activities is the number of activities used at the time of the survey. 
This analysis excluded schools in which there was no transition to new 
buildings (e.g., K-8 and K-12 schools). 

Articulation activities were significandv less extensive in 7-12 schools 
than in other schools that begin in the middle grades (see table 7, col. 
1, row 1). Schools containing a large percentage of students living in 
povertv have less extensive articulation programs (row 8). Schools 
serving a large percentage of professional or managerial families, high- 
ability students, and populous urban areas have more extensive pro- 
grams (rows 9-11). 

There is evidence that an extensive articulation program may be 
beneficial. The standardized regression coefficient of .23 (in table 7, 
col. 2, row 12) indicates that principals in schools using numerous and 
diverse articulation activities are more likely to report that their ar- 
ticulation program is meeting student needs. Further, an extensive 
articulation program slightly— but significantly — increases the like- 
lihood that students will succeed in their first vear in the new school. 
That is, the - .07 in row 12 of column 3 indicates that fewer students 
are retained to repeat the transition grade in schools that have extensive 
transition programs. Of course, a school's retention policies (Appendix, 
variable X) also influence the percentage of students retained to repeat 
the transition grade (rows 13-15). A greater percentage of students 
are retained in schools where students are typically held back for 
failing one, two, or three courses or for excessive absence or lateness 
than in schools where students are not held back for these reasons 
(e.g., schools where students are held back only for failing four or 
more courses). 

Discussion 

How much do our data support the idea that middle-grades schools 
will be more successful if thev adopt supportive structures, practices, 
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and services that leading educators in the middle-school movement 
often recommend as being especially responsive to the needs of earlv 
adolescents? Are there clear payoffs— measurable benefits to students 
or to the school program— if schools establish group advisory periods, 
interdisciplinary teams, provide remedial activities, and conduct ex- 
tensive articulation practices? 

First, it must be admitted that the "clear payoffs" question cannot 
be answered conclusively without information on student achievement, 
attitudes, or other important measures. Although principals' estimates 
of the strength of their middle-grades programs and of the benefits 
that result from responsive practices are important, they mav or may 
not be related to student outcomes. Further, principals' estimates of 
future dropout rates and of current retention rates are undoubtedlv 
imperfect reflections of actual dropout and retention rates. But it also 
is true that, in the context of existing research on responsive middle- 
grades practices, the data and analvses of this survey greatlv extend 
knowledge of what practices are being implemented, the tvpes of 
schools that are adopting or rejecting recommended practices, and 
the potential effects of these practices. 

The evidence from principals suggest that most of the recommended 
practices yield measurable but modest benefits. For example, based 
on our data, a school in which the average frequencv of occurrence 
of nine supportive group advisorv activities is weeklv rather than a 
few times per vear is predicted to save 1 percent of the schools' students 
from dropping out before thev finish high school. A school that provides 
an extra subject period within the school day to those students who 
need coaching or remediation is predicted to reduce its dropout rate 
by almost 1.5 percent. A school that uses the average number of 
articulation/transition practices is predicted to raise the percentage of 
students who succeed in their first vear at the new school by approx- 
imately 1 percent over the promotion rates observed in otherwise 
similar schools that provide no special articulation/transition activities. 
Middle-grades programs in schools that balance a well-implemented 
interdisciplinary teacher team organization with a continuing com- 
mitment to departments are rated as much stronger by their principals 
(almost three-fourths of a point stronger on a 4-point scale) than are 
middle-grades programs in schools where teams, common planning 
periods, team leaders, and department heads are absent. 

These results support the use of responsive practices and mav un- 
derstate their benefits. The potential benefits of responsive practices 
may be still greater than the average benefits reported here because 
some of the measures of practices were gross estimates of general 
aspects or broad distinctions in practices. For example, the measure 
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of the extra-subject-period approach to remediation was a simple di- 
chotomy. It distinguished schools using anv variety of the extra-subiect- 
per.od approach from schools that did not offer extra subject periods 
during the school day. Schools that provided intensive help during 
the extra subject period were lumped together with schools that provided 
little remedial instruction during the extra subject period (e.g. schools 
in which the period is more like a "study half than a "coaching period") 
The benefits of having extra subject periods of intensive, well-organized' 
remedial instruction are undoubtedlv larger than the average benefits 
of generic extra subject periods. 

Further, the combined benefits of using several responsive practices 
simultaneously are larger than the benefits of using any one practice 
by itself. For example, schools implementing a strong group advisory 
program, an extra-subject-period approach to remediation and re- 
sponsive grading practices (Mac Iver. 1990) achieve more than a 3 
percent reduct.on in expected dropout rates. Also, there are other 
likely benefits of responsive practices that were not measured at all in 
this study. For example, the. typical cumulative effects of being in a 
responsive middle-grades school for three entire years on young ad- 
olescents motivation to learn, achievement, and engagement anc [sat- 
isfaction with education mav be substantial. 

In this study (and in any study examining the relations between 
educational practices and outcomes), some of the observed relations 
mav be spurious. We have controlled for a large number of possible 
confoundmg variables" (e.g.. average ability of students upon entry- 
percent of professional/managerial families in the school; percent 
of minority students; retention policies; regional differences in ed- 
ucation policies; grade span; percent of families below the poverty 
tne: urbanicitv). but some important but less obvious variables mav 
have been ignored. Thus our conclusions must be viewed as tentative 
rather than as definitive. Still, the resultsof this study givejustifiable 
encouragement to the manv educators who have been calling and 
working for more responsive structures and services in the middle 
grades. 

The results suggest, however, that to realize the benefits of a responsive 
practice, schools must make sure that practices are implemented prop- 
erly. For example, a group advisory period will yield few benefits 
unless the teachers actually use the time to provide frequent social 
and emotional support activities to the students. Similarly, schools that 
organize their faculty into interdisciplinary teams without taking the 
steps necessary to make this organization work (e.g., establishing 'team 
leaders and common planning periods and training members how to 
use team planning effectively) mav reap few benefits from teaming 
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A departmental organization and emphasis is not usually recom- 
mended bv the leading educators in the middle-school movement. 
However, the data from principals suggest that schools that decide to 
emphasize departments and take the steps necessary to make this 
commitment work (establishing department heads, common planning 
periods for departments, and within-department team teaching) mav 
be able to strengthen their programs just as much as schools that 
choose an interdisciplinary emphasis and take the steps necessary to 
make this emphasis work. What is equallv important, we have seen 
that an interdisciplinary team organization and a departmental or- 
ganization are not mutually exclusive. 

Not all alternative approaches are equally beneficial, however. On 
the limited set of school-level outcomes examined in this study, the 
provision of an extra subject period during the school day was more 
beneficial than other approaches to remediation (presumably because 
of higher attendance and lower stigmatization of low achievers when 
the extra-period approach is used). 

Educational researchers concerned with the middle grades are fre- 
quently asked, "What is the best grade span for a middle-grades school?" 
Overall, the responsive practices considered in this article are found 
most consistentlv in 6-8 middle schools. None of the other grade 
organizations used responsive practices significantly more than these 
schools. Still, grade organization is not a consistent determinant of 
responsive middle-grades practices. For example, although, on average. 
K-8 and K-12 schools are significantly less likely than 6-8 schools 
to implement interdisciplinary teaming, they are just as likely as 6-8 
schools to use supportive activities during advisory group periods. 
Overall, 7-9 junior high and 7-12 schools use fewer responsive practices 
than other schools. But some junior high and 7-12 schools are as 
responsive as some middle schools on some practices. 

One should not forget that the conclusions concerning the antecedents 
and consequences of responsive practices are based on data from 
public schools. As noted earlier, the sample did not include anv Catholic 
or other private schools. Some of the effects described here (e.g., less 
use of interdisciplinary teaming in K-8 schools) mav or mav not 
generalize to private schools. 

Many states and manv school districts are attempting to restructure 
education in the middle grades. For example, 20 states have formed 
or are forming special task forces to examine the status of education 
in the middle grades and to make recommendations for improvement 
(Children's Defense Fund 1988; also see California State Department 
of Education [1987] and Maryland State Department of Education 
[1990]). Also, several major foundations (such as the Carnegie Cor- 
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poration and the McConnell Clark Foundation) are attempting to 
stimulate restructuring efforts in the middle grades. It is, therefore 
a critical time for building a solid middle-grades research base. For 
example, studies that explore the natural variation in middle-grades 
practices in the "real world M and that test the effects of these variations 
on a schools level of success are needed to assist educators in evaluating 
and selecting alternative approaches to middle-grades improvement. 

Appendix 

Variables Used in the Regression Analyses 

w h l S « A J? pendix conlains select «i questionnaire items from Education in the 
Middle Grades . a national survev of practices and trends c- nducted in the spring 
of 1988 with a large, representative sample of public schools that include 
grade /. The complete questionnaire mav be found in Epstein and Mac Iver 
(1990). Throughout the questionnaire, principals were reminded to focus on 
their schools practices "in the middle grades (in grades 5 through 9 for the 
grades that are in your school)." 

Practices, Programs, Policies, and Staff in the Middle Grades 

/. Use of Supportive Activities during Group Advisory Period 

Each school was assigned a score representing the principal's mean response 
to the following set of items: 

How frequently do the following activities occur during a homeroom or 
CROfp advisory period in vour school? 

(e) Meet with individual students about problems. 

< / ) Give career information and guidance. 

(g) Discuss academic problems or issues. 

ih) Discuss personal or family problems. 

(/) Discuss social relationships and peer groups. 

()) Discuss health issues, e.g. drug use prevention, familv planning, etc 

[k) Discuss moral or ethical issues and values. 

(/) Discuss intergroup relations and multi-cultural issues. 

(m ) Develop student self confidence and leadership. 

The response scale for each item was: Daily (5), Weekly (4), Monthly (3) A 
Few per Year (2), and Never (1). ' 



//. Guidance Counselors in the Middle Grades 



About how manv different students are assigned to each guidance counselor? 
(Give the guidance counselor-student ratio.) 

If vou have no guidance counselors, write none here: and skip 

to the next question. 1 
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NUMBER OF STUDENTS PER COUNSELOR! — — — 

Note.— Two variables were created based on responses to this question: 
A "studenis-per-counselor ratio" and a "presence-of'-guidance-counselor' 
(iumrnv variable (coded " 1" for schools with guidance counselors and coded 
"0" otherwise). 



///. Teacher Certification 

Middle schools often have some mix of teachers trained and certified for the 
elernentarv grades or secondarv grades or middle grades. How many of the 
teachers in vour school are trained ?nd certified (including provisionally certified 
teachers) in these different wavs? (Please give your best estimates of these 
numbers.) 

(a) Teachers with elementary certification. 

(b) Teachers with secondary subject-ma iter certification. 

(r ) Teachers with specific middle grades certification (separate from ele- 
rnentarv or secondarv). 
{d) uncertified teachers. 
(e) Other (describe). 

Note. — Responses to this question were used to compute the percent ot 
teachers with each tvpe of certification. 



IV. Implementation of an Interdisciplinary Team Approach 

A school's emphasis on an interdisciplinary team approach to instruction and 
school organization was determined based on the principal's responses to 3 
questions: 

Is this practice part of vour middle grades program now? 

(a) Interdisciplinary teams of teachers who share the same students. 

(/;) Common planning period for members of interdisciplinary teams. 

Does vour school use interdisciplinary Team Teaching? Two or more 
teachers of different subjects share the same group of students and/or 
coordinate their instructional program across subjects. 

Circle all grades in which you use interdisciplinary teams: 5 6 7 8 9 

Note. — The "Implementation of an Interdisciplinary Team Approach'* 
composite variable was equal to the number of ves responses on items (a ) and 
(b) above, plus the proportion of grades in which interdisciplinary teams were 
used (maximum composite score = 3, minimum = 0). 



V 7 . Implementation of a Departmental Organization and Emphasis 

A school's level of "commitment to departments'* was determined based on 
the principal's responses to 3 questions: 

Is this practice part of vour middle grades program now? 

(a) Departments organized with their own chairpersons or heads. 

{b) Common planning period for members of departments. 
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Docs vour school use department (single-subject) Team Teaching? 
Teachers in the s.\ M e department plan and teach together creating small 
group and large group activities bv combining classes or regrouping students. 
Circle all grades in which vou use DEPARTMENT teams: 5 6 7 8 9 
No I K. — I'lie "Commitment to Departments" composite variable was equal 
to the number of ves responses on items {a ) and (b) above, plus the proportion 
of grades in which department teams were used (maximum composite score = 
3, minimum = 0), 

17, Amount of Common Planning Time for Interdisciplinary 
Team Members 

How much common planning time is officially scheduled each week for 
the interdisciplinary team? 

No official common planning time (1), Less than 30 minutes a week (2), 
Between one-half and 1 hour per week (3), Between 1 and 2 hours per week 
(4), Between 2 and 3 hours per week (5), More than 3 hours per week (6), 

VII. Use of Common Planning Time on Interdisciplinary Teams 

In a tvpical planning period for an interdisciplinary team, about how much 
time is spent on the following activities? Circle one choice for each activity 
that comes closest to vour estimate of the work vour teachers do during team 
planning meetings. 

(a) Individual Teacher Preparation. Teachers work on their own lessons, 

tests, grades. 

{b ) Coordinate Content. Teachers decide common themes and related topics 
for instruction. 

(r) Revise Schedules. Teachers arrange or alter schedules for classes that 
need more time, 

id) Regroup Students. Teachers arrange small- or large groups of students 
to match lessons to abilities, 

(e) Diagnose Individual Students. Teachers discuss problems of specific 
students and arrange help. 

( / ) Plan Special Events. Teachers arrange assemblies, trips, or other team 
activities. 

{g ) Conduct Conferences with Parents, Teachers meet as a team with individual 
parents to solve problems, provide assistance. 
The response scale was: 

How Much Time Per Planning Period? 
None Little Less than half About half More than half 
Note. — The "proportion of common planning time spent on team activities" 
composite variable was the mean of each principal's responses to {b)-(g). 

VIII. Establishment of Formal Leaders for Interdisciplinary Teams 

How is the leader chosen for the interdisciplinary team of teachers? (Circle 
one.) 
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No leader is identified I 

Leader emerges informally as the team works together 2 

Appointed by principal or other school official 3 

Elected by other members of the teaching team 4 

Leader rotates among members over time 5 

Note. — The measure, "Have team leader?" was coded "0" if there were 

no team leaders or 'informal" team leaders (options I or 2 above) and was 

coded "1" otherwise. 



/X. Extensiveness of Remedial Instruction Activities 

All schools have some students who fall behind or learn more slowly than 
other students, Does your school offer any of the following remedial activities 



for these students? (Circle all that apply.) 

No special programs, it is up to students to stav on grade level 1 

Extra work or homework by classroom teacher 2 

Pull-out program in reading or English 3 

Pull-out program in math 4 

Extra subject period instead of elective or exploratory course 5 

After-school or before-school classes or coaching sessions 6 

Saturday classes 7 

Summer school 8 

Other (describe) 9 



Note. — The extensiveness of a school's remedial instruction program was 
measured bv counting the number of different programs offered bv the school. 
Practices 1 or 2 CNo special programs" or "extra work or homework") were 
not included in this count. 



X. Retention Policies! Major Reasons Students Are Retained 



What are the major reasons most students are retained to repeat a grade in 
your school? (Circle all that apply as major reasons that students repeat the 
middle grades.) 

Failing one course I 

Failing two or three courses 2 

Failing more than three courses 3 

Excessive absence or lateness 4 

Failing achievement or proficiency tests 5 

Other (describe) 6 



XI. Organization of the Transition from the Elementary 
to the Middle Grades 

How do you organize the transition from the elementary to the middle 
grades? (Circle the numbers to the right of ALL of vour present practices.) 
No transition — middle grades continue in K-8 program 1 
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0 

It 

12 
13 



No special activities until students arrive in the fall 

Middle grades students present information at elementary school 3 

Elementarv school students visit middle grades school for assembly ... 4 
Elementary school students attend regular classes at middle grades 

school -y; ■ D 

Parents visit middle grades school while children are still in elementary 

school \ ; " * : • ■ ■ 

Parents visit middle grades school for orientation in the fall atter 

children have entered • ' 

Summer meetings at the middle grades school » 

Buddy or big brother/sister program pairs new student with older one 

on cntrv L""V 

Middle grades and elementary teachers meet together about courses 

and requirements ■ ■ 

Middle grades and elementary administrators meet together on 

articulation and programs ■ ■ ■ ■ ■ 

Middle grades counselors meet with elementary school counselors or 

staff 

Other (describe) 



Characteristics of the School 



X//. Grade Organirjation 

What arc the lowest and highest grade levels in your school? (Circle 2 
choices.) K K I 2 3 4 5 6 7 8 9 10 11 12 

XIII. Region (as defined by the U.S. Bureau of Census) 

Schools were categorized by region: 
West 
Midwest 
South 

Northeast .... ■ ui . 

This categorization was represented in the analyses by three dummy variables, 
the Northeast served as the control category. 

XIV. Population of SMSA/Urbanicity 

The population of the urbanized area of which the school is a part (in 100s). 
This includes the number of people living in the entire densely settled area 
around a citv (e.g.. people living in nearby suburbs or outlying cities and 
counties.) Schools in locations that are not in (nor adjacent to) an urbanized 
area are assigned a 0 on this variable. 
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Characteristics of the School's Students 



XV. Percent Minority Students 



Approximately what percentage of vour present students are members of the 

following racial or ethnic groups? 

(a) Black/ Afro- American % 

(6) Hispanic-American % 

(f) Asian-American ^ 

{d) American Indian % 



XVI, Percent Families Below Poverty Line 



The Orshanskv Percentile for the school. 



XVII. Percent Professional Families 



Approximately what percentage of the students currently enrolled in vour 
school are from families in the following categories? 

(a) Professional and managerial personnel 9< 



XVIII. Average Ability of Students on Entry 



How would vou rate the average academic ability of students when thev enter 
this school? 

Considerably above the national norm (5). Somewhat above ihe national 
norm (4), At the national norm (3), Somewhat below the national norm (2), 
Considerably below the national norm (1). 



Outcomes for the School and Its Students 
XIX. Strength of the School's Guidance Programs 



How well do vour present practices match your ideal program for guidance, 
advice, and counseling of students in the middle grades? 

excellent— present practices fit students' needs exactly (4), good— basic 
practices are in place, minor changes needed (3), fair— need to improve or 
add several practices (2), weak— need to design new practices, major changes 
needed (I) 
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XX. Percentage of Boys and Percentage of Girls Who Probably Will 
Not Graduate from High School 

Based on vour experience, past records, or best guesses, please estimate the 
percent of your present seventh grade boys and girls who will probably 
not graduate from high school. 

(a) percent of present seventh grade boys who will probably not graduate 
from high school % 

(b) percent of present seventh grade girls who will probably not graduate 
from high school % 



XX/. Strength of the School's Overall Middle Grades Program 

How well do your present practices match your ideal of a successful program 
for students in the middle grades? 

excellent — present practices fit students' needs exactly — exemplar}' pro- 
gram (4). good — basic practices are in piace, minor changes needed — solid 
program (3). fair — need to improve or add some practices — developing pro- 
gram (2). weak — need to design new practices and major revisions — changing 
program (1) 



XX//. Benefits Resulting from Use of Interdisciplinary Teams 

Each school was assigned a score representing the principal's mean response 
across four different types of benefits: 

There are potential benefits in using interdisciplinary teams in the middle 
grades. How often do you think the following occur as a result of interdisciplinary 
teams in vour school? 

Students identify with the team, build team spirit, and improve school work 
and attitudes { 1 = Never. 5 = Always). 

Individual student problems are recognized quickly and solved effectively 
( 1 = Never. 5 = Alwavs). 

Teachers use other team members as sources of social support and under- 
standing ( 1 = Never, 5 = Always). 

Instruction is more effective due to integration and coordination across 
subjects and courses (1 = Never. 5 = Always). 



XXIII. Strength of the Schools Transition! Articulation Program 

How well do vour present practices match your ideal program for students' 
smooth transitions to and from the middle grades? Circle one choice. 

excellen t — present practices fit students' needs exactly (4). good — basic 
practices are in place, minor changes needed (3). fair — need to improve or 
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add several practices (2), wkak — need to design new practices and major 
changes (I). 

XXIV. Retention Rates in the Middle Grades 



At the end of last school year (after summer school), about how many students 
were promoted to the next grade and how many were retained to repeat the 
same grade this year? (Give approximate numbers.) 

For 1987 School Year af ter Summer School number of students 

(a) From Grade 5 — promoted — retained 

(b) From Grade 6 —promoted —retained 

(c ) From Grade 7 — promoted _ retained 

(d) From Grade 8 —promoted „ retained 

(e ) From Grade 9 — promoted _ retained 



Notes 



We are indebted to colleagues at CDS who collaborated in the design of 
the questionnaire for this survev. including Henry Jay Becker. Jomills H. 
Braddock III. and James M. McPartland. We appreciate the support of the 
National Association of Elementary School Principals and the National As- 
sociation of Secondary School Principals in the survey effort. We are especially 
grateful to the principals in schools that contain grade 7 who invested their 
time to provide information about middle-grades education. We would also 
like to thank Allan Wigfield and anonymous reviewers for helpful comments 
on an earlier draft. This research is sponsored by the Office of Educational 
Research and Improvement. U.S. Department of Education, under grant 
number OERI-G-86-90006. The opinions presented are those of the authors 
and no official agreement by OERI should be inferred. 
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School Competency Testing Reforms and Student Achievement: 
Exploring a National Perspective 



Linda F. Winfield 

The Johns Hopkins University Center for Research 
on Effective Schooling for Disadvantaged Students 

This study investigates the relationship between school-level minimum competency testing 
(MCT) programs and student reading proficiency as measured by the 1983-1984 National 
Assessment of Educational Progress (NAEP). Comparisons of student-level proficiency out- 
comes within race/ethnic groups (White, Black, and Hispanic; were made after adjusting for 
individual and school-level variables for (he 4th-. 8th-. and 11th- grade \AEP samples. In 
general, results indicated a higher level of proficiency among students in Grades 8 and 1 1 
attending schools with MCT programs compared with their counterparts in schools with- 
out such programs. Xo advantage of attending such schools was identified for students in 
Grade 4. 



Since the early 1970s and throughout the 
1980s, numerous reform initiatives have 
sought to increase the accountability and 
effectiveness of public education in Amer- 
ica. Timar and Kirp (1989) note: "Since 
1983 the states have generated more rules 
and regulations about all aspects of educa- 
tion than in the previous twenty years' 1 (p. 
506). Efforts to improve student achieve- 
ment outcomes have included increasing 
graduation requirements and implementing 
assessment programs that define both stand- 
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ards of performance for students and stand- 
ards of accountability for the educational 
system. Competency-based testing programs 
implemented at the local and state levels 
have continually increased as a primary 
method of reform (Airasian. 1987). Since 
the mid-seventies, over 35 states have re- 
quired local school districts to give mini- 
mum competency tests (MCT) to students 
in elementary, junior high, or senior high 
school (Pipho. 1983), In 1984, 40 states were 
actively involved in some aspect of mini- 
mum competency testing. 19 states were 
using test performance as a basis for high 
school graduation, and 5 states were using 
tests as a basis for grade promotion (Ander- 
son & Pipho. 1984: Pipho & Hadley, 1984). 

The majority of MCT programs focus on 
improvement in basic skills in reading and 
math (Educational Commission of the 
States. 1984). One basic premise for imple- 
menting such programs is that the tests 
clearly specify learning objectives and en- 
courage schools and teachers to focus in- 
struction more precisely. Additionally. MCT 
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results can be used as a basis for diagnosis 
and remediation of academic skills (Cohen 
& Haney, 1980) and consequently lead to 
higher student achievement. A major objec- 
tion to MCT programs, however, is that they 
lead to "teaching to the test" and a narrow 
focus of instruction that neglects those skills 
not included on the tests (Broudy, 1980; 
Koretz, 1988). If this is the case, the overall 
quality of school programs is being adversely 
affected. Others suggest that competency 
tests and standards function more as sym- 
bolic and political gestures than as instru- 
mental reforms (Ellwein, Glass, & Smith, 
1988). 

The Impact of MCT Programs 

Two related areas of research on student 
outcomes provide a framework for concep- 
tualizing the potential impact of compe- 
tency-based programs. First, research on 
schools that are considered "unusually effec- 
tive" in facilitating reading achievement in 
minority and low-socioeconomic (SES) pop- 
ulations suggests that MCT programs con- 
tribute to school-wide success through a 
clear definition of learning objectives, cur- 
riculum organization, and careful monitor- 
ing (Edmonds, 1979: Euoanks & Levine, 
1983: Good & Brophy, 1986: Kyle, 1985: 
Mackenzie, 1983: Purkey & Smith, 1983, 
1985; Stedman. 1987: Stringfie'd & Teddlie. 
1988). In contrast, this research also indi- 
cates that school-wide improvement in stu- 
dent achievement is more likely to occur 
from increasing local decision-making and 
school-site responsibility. Change can be 
successfully implemented at the individual 
school building level, given the appropriate 
conditions, procedures, and support systems 
(Darling-Hammond & Wise, 1985: Fullan, 
1985: Goodlad, 1984: Sirotnik & Clark. 
1 988: Sirotnik & Oakes, 1986). Thus, a com- 
bination of top-down and bottom-up ap- 
proaches in implementing school improve- 
ment efforts appears to be necessary for suc- 
cess. 

The second related area of literature, gen- 
erally known as "school effectiveness" re- 
search, attempts to isolate the relative effect 
of specific school characteristics as com- 
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pared with that of family background and 
SES on academic achievement. (See Ma- 
daus, Airasian, & Kellaghan, 1980, for a 
review of the evidence of this area). Some 
studies in this area, derived from a sociolog- 
ical perspective, indicate that student back- 
ground characteristics have a higher impact 
on achievement than do school characteris- 
tics (Coleman et ah, 1966: Jencks et aL 
1972: Mosteller & Moynihan. 1972: Smith, 
1972). Other studies, however, emphasize 
the combined effect of home and school 
factors (Mayeske et aL 1972: Mayeske, 
Okada, Cohen, Beaton, & Wisler, 1973). 
More recent studies, including those on "ef- 
fective" schools, have provided additional 
evidence which indicates that specific school 
characteristics can facilitate achievement 
(Barr & Dreeben, 1983: Edmonds, 1979: 
Frederiksen, 1975: Good & Brophy, 1986: 
Kyle, 1985: Purkey & Smith, 1983, 1985: 
Stedman, 1987: Venezky & Winfield, 1979). 
Specific school level policies and institu- 
tional practices (e.g., grouping, instructional 
pace, and content coverage) are related to 
student reading achievement outcomes 
(Barr & Dreeben, 1983). The framework of 
the "nested layers" in which schools operate 
further suggests that actions at higher layers 
(e.g. district and state) influence conditions 
occurring at tho school and classroom levels 
(Purkey & Smith, 1983). Considered in this 
manner, MCT programs may be viewed as 
a state, district, or school-level variable influ- 
encing academic achievement. 

As a result of competency programs, in- 
creased attention has focused on outcome 
measures (Murphy, 1989), and more re- 
sources are targeted for students who need 
remediation. However, Black and low-SES 
students fail MCTs in substantially higher 
proportions than do White and higher-SES 
students (Jaeger, 1982: Jonas & Wallace, 
1986; Linn, Madaus. & Pedulla, 1982: 
Serow, 1984), and remediation for students 
who had failed competency tests in reading 
was found to be less effective for Black stu- 
dents than for White students (Serow & 
Davies, 1982). Serow (1984) reported that 
in four states in which MCT programs had 
been implemented at the secondarv level. 
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Black students had a substantially lower 
passing rate than did Whites. Moreover, in 
one state, lower SES students were about 
one third less likely to pass the exam on their 
first attempt compared with higher SES stu- 
dents. Unfortunately, too few studies have 
assessed the impact on minority and low- 
SES students. In general, minority issues 
receive the most attention during test con- 
struction and validation (Ellwein et al., 
1988). 

Over 400 articles were published between 
1977 and 1987 regarding MCT; however, 
62% of this literature was rhetorical 
(Ellwein, Popp, & Neimann, 1988). There 
have been few empirical research studies of 
the effects of MCT on student achievement. 
Most studies have been conducted at the 
local or state level and have used as an 
outcome measure the percentage of students 
passing the test. Thus, if MCT programs 
enhance initial test-taking scores, then a de- 
cline in the percentage of students previously 
failing the test is taken as indirect evidence 
of improved student outcomes. However, 
what appears to be an indicator of improve- 
ment in students' basic skills might reflect 
either practice or regression to the mean 
(Serow, 1984). Additionally, the criteria for 
passing may fluctuate over time. One of the 
few studies that used standardized achieve- 
ment as an outcome found that after imple- 
mentation of MCT programs, ninth-grade 
math basic skills increased but did not show 
continual increases over the succeeding 3 
years (Mangino & Babcock, 1986). 

Despite the limited knowledge base on 
implementation effects, the number of states 
and districts that implemented testing re- 
forms increased dramatically during the past 
decade. Some analysts suggested that these 
reforms will yield few returns because they 
are built upon and reinforce existing organ- 
izational arrangements (Chubb, 1988). Oth- 
ers suggest that the reforms may be success- 
ful because key organizational linkages in 
existing school structures have been tight- 
ened (Murphy, 1989). What impact have 
testing reforms had on student achievement 
outcomes nationally? Because of the varia- 
tion among testing programs in contexts, 
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procedures, and criteria for passing, it is 
difficult to compare student achievement 
outcomes on anything but an intrastate ba- 
sis. The need for accurate information re- 
garding school quality and educational re- 
forms has been documented (Alexander- 
James, 1987). The National Assessment of 
Educational Progress (NAEP) will be rede- 
signed to provide state-by-state comparisons 
in the area of reading and mathematics in 
1992. Although there is evidence that basic 
reading skills improved nationwide over the 
last several years (NAEP, 1985), one analysis 
suggested that this trend could not be attrib- 
uted to competency-testing programs be- 
cause the upturn in achievement had already 
been under way a few years prior to the 
major growth of MCT reforms in the late 
1970s (Congressional Budget Office, 1987). 
However, the aggregated trend data do not 
provide direct evidence of the effect of these 
programs on schools and students. The ma- 
jor substantive questions of interest in this 
study are two: (a) What is the relationship 
between school-level MCT programs and 
student reading achievement outcomes? (b) 
Does this relationship differ for various race/ 
ethnic groups? A second concern is the fea- 
sibility and utility of using the new, rede- 
signed NAEP (Messick. Beaton. & Lord, 
1983) data in addressing these issues. For 
this reason, the study should be considered 
as exploratory in nature. 

Method 

Sample 

The data for this study are from the 1983- 
1984 NAEP. The NAEP is funded by the 
Office for Educational Research and Im- 
provement and is under a grant for the Ed- 
ucational Testing Service. Each NAEP as- 
sessment involves a random cross-sectional 
survey of in-school 9-. 13-. and 17-year olds. 
In the 1983-1984 assessment, in addition to 
sampling by age. Grades 4, 8. and 1 1 were 
also sampled. Each age/grade cohort in- 
cluded approximately 30,000 students. The 
1983-1984 NAEP sample was based on a 
stratified, four-stage probability sampling 
design in which counties, schools, type of 
sessions, and students were sampled. In se- 
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letting schools, those in large cities with high 
concentrations of low-SES students and 
those in extremely rural areas were sampled 
at twice the rate of other schools. A total of 
64 first-stage units was included in the sam- 
ple to represent the 50 states and the District 
of Columbia, and assessments were con- 
ducted at 1,465 schools. 

In addition to the assessment of student- 
level data, NAEP collected school-level in- 
formation from school administrators. A 
principal or his or her representative com- 
pleted the five-page questionnaire concern- 
ing staffing patterns, curriculum, and stu- 
dent services. The overall survey response 
rates were 81% for Grade 4, 75% for Grade 
8, and 75% for Grade II. 

Subsample 

Schools included in this study are a non- 
random subsample of the original NAEP 
sample. Schools were included only if the 
principal (a) responded to the school ques- 
tionnaire and (b) provided responses to the 
minimum competency questions that were 
consistent across the items included. The 
school response rates for the item requesting 
information on minimum competency test- 
ing were lower than overall survey response 
rates and were 49% for Grade 4, 52% for 
Grade 8, and 60% for Grade 11. Analyses 
of schools that did not respond to the MCT 
item indicated no significant differences on 
the school- and student-level variables in- 
cluded in the study. These data and other 
information characterizing the schools and 
students in the study can be found in a 
discussion by Winfield (1987b). Because 
NAEP produces a representative national 
sample, each student or school has an asso- 
ciated sampling weight to account for the 
differential probability of selection and ad- 
justments for nonresponse and poststratifi- 
cation. To ensure adequate representation, 
certain subgroups were sampled at a higher 
rate than the rest of the population. Thus, 
sampling weights were used in all analyses. 
(These weights were rescaled so that the sum 
of the weights equaled the number of cases 
included in each analysis. See NAEP. 1 986a. 
1986b, for procedures to be used when ana- 
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lyzing NAEP data.) The unweighted and 
adjusted weighted frequencies of schools and 
students in the total subsample are shown in 
Table 1. 

In subsequent analyses, the size of the 
subsamples for Grades 4, 8, and 1 1 were 
10,367, 10,829, and 13,513. respectively. 
These numbers represent 39.8% of the total 
NAEP Grade 4 cohort, 41,8% of the Grade 
8 cohort, and 55.2% of the Grade 1 1 cohort. 
The number in each racial/ethnic group was 
7,491 Whites, 1,733 Blacks, and 1,143 His- 
panics in Grade 4; 7,574 Whites. 1,906 
Blacks, and 1,349 Hispanics in Grade 8: and 
9,203 Whites, 2,112 Blacks, and 2.198 His- 
panics in Grade 1 1 . 

Reading Proficiency Outcome Variable 

The goal of NAEP is to estimate group 
means rather than individual proficiency; 
thus, each respondent may answer only a 
subset of the total number of assessment 
items. In the 1983-1984 assessment, NAEP 
used a balanced incomplete block (BIB) spi- 
raling procedure in which the total assess- 
ment was divided into blocks of 15 minutes 
each. Each student was administered three 
15-minute blocks of items and a 6-minute 
block of general background questions. The 
BIB part of the method assigned blocks to 
booklets in such a way that each block ap- 
peared in the same number of booklets and 
each pair of blocks appeared in at least one 
booklet. The spiraling part of the method 
then cycled the booklets for administration 
so that no two students in any assessment 
session in a school received the same book- 
let. At each age group, each block is admin- 
istered to approximately 2.000 students and 
each pair of blocks to a smaller number, 
depending upon the particular BIB design 
(NAEP, 1986a). 

Item response theory (IRT) technology 
was used to estimate reading proficiency 
levels. This theory defines a student's prob- 
ability of answering an item correctly as a 
mathematical function of an underlying pro- 
ficiency or skill. Indicators of proficiency are 
computed as random draws from the ex- 
pected distribution of proficiency of each 
respondent given the observed data, in this 
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TABLE I 

Unweighted and adjusted weighted frequencies of schools and students by school response to NAEP 

MCT item 

Schools Students 
Grade and response Unweighted Weighted c c Unweighted Weighted % 



Grade 4 min. comp. 



Yes 


169 


334 


25.3 


7.226 


11,121 


29.0 


No 


154 


312 


23.6 


5.289 


7,952 


20.7 


No .response 


340 


676 


51.1 


13,489 


19,222 


50.3 


Total 


663 


1,322 


100 


26,004 


38,295 


100 


rade 8 min. comp. 














Yes 


141 


262 


27.0 


6,744 


13,426 


32.1 


No 


110 


235 


24.3 


4,521 


8.431 


20.1 


No response 


235 


472 


48.7 


10.573 


20.004 


47.8 


Total 


486 


969 


100 


21.838 


41,921 


ioo 



Grade 1 1 min. comp. 



Yes 


118 


203 


30.6 


9,170 


17,621 


41.5 


No 


82 


152 


25.2 


5,454 


9,778 


23.0 


No response 


131 


238 


44.2 


8,119 


15,067 


35.5 


Total 


331 


638 


100 


22.788 


42.466 


100 



\ote. NAEP = National Assessment of Educational Progress: MCT = Minimum Competency Test: min. comp. = 
minimum competency. 



instance responses to NAEP reading exer- 
cises and background variables. (See Mis- 
levy, 1985. for the statistical foundations of 
this approach.) The distribution of such 
draws, one taken for each respondent and 
weighted in inverse proportion to the re- 
spondent's probability of appearing in the 
sample, estimates the distribution of profi- 
ciency in the population as a whole or in a 
given subpopulation. Because the resulting 
indicators do not represent precise estimates 
of proficiency for individual examinees, five 
"plausible values" from this distribution are 
provided for each student, who was admin- 
istered at least one block with reading items. 
The NAEP reading proficiency scale ranges 
from 0 to 500 with a mean of 305 and a 
standard deviation of 50. 

School-Level. Individual, and Control 
Variables 

Although a number of school-level vari- 
ables were included in initial analyses (e.g.. 
teacher turnover, hours of in-service, 
whether there was a Chapter I program, and 
student absenteeism), many of these were 
deleted in the final analyses because of mul- 



ticolinearity. Variables included were iden- 
tified through a combination of stepwise 
multiple regression and were based on the- 
oretical relevance. Variables selected at the 
student level were family background, stu- 
dent academic behaviors, age, and sex. Fam- 
ily background consisted of responses to 
items on parental education, reading mate- 
rials in the home, and the extent of family 
reading. Student academic behaviors con- 
sisted of students' responses to items re- 
questing the number of pages read for school 
and the amount of homework. Control vari- 
ables were region of the country, percentage 
of students on free lunch, and school racial 
composition, and school district SES. Two 
potential explanatory variables were school- 
level aggregate of instructional dollars per 
pupil and the presence/absence of a reme- 
dial program for students failing the MCT. 

School-level effect of interest— (MCT) pro- 
gram in reading. Two of the four items on 
the NAEP questionnaire, taken from prin- 
cipal's self-reports, were used to identify 
schools implementing MCT programs. One 
item read: "In which of the following sub- 
jects are students required to pass a mini- 
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mum competency test?" Respondents were 
required to answer either yes or no to items 
identifying several subject areas, one of 
which was reading. The second item read: 
"In what year was each of the following 
minimum competency tests first adminis- 
tered?" Responses to this item ranged from 
year 1976 to 1984. Affirmative responses to 
the MCT item on reading were then coded 
according to whether the program had been 
implemented prior to 1980. School- and stu- 
dent-level descriptive data for each grade 
level by school type and for each race/ethnic 
group included in the study are presented in 
Tables 2. 3. and 4. 

Unadjusted reading proficiency means 
and standard deviations by grade, race/eth- 
nic group, and school type are shown in 
Table 5. 

L jsign and Data Analysis 

What is the relationship between school- 
level MCT and student reading proficiency? 
In this multilevel analvsis of schools, indi- 



vidual student reading proficiencies rather 
than school-level aggregate achievement 
(Burstein & Miller, 1981) were used as de- 
pendent measures. From a statistical frame- 
work, one might ask. What is the effect on 
student proficiency of having an MCT pro- 
gram in reading after controlling for regional 
variation, school-level SES, and individual 
student variables? Is the effect of MCT on 
reading proficiency outcomes the same for 
all race/ethnic groups? Ideally, to answer 
either of these questions, individual student- 
level achievement within schools must be 
examined for changes in the distribution of 
reading proficiency for various race/ethnic 
and SES groups over a period of time. How- 
ever. NAEP data are cross-sectional and 
available for one time period only; thus this 
investigation is limited to the direction and 
strength of correlates of achievement. 1 

An analysis of covariance within a regres- 
sion framework was conducted for each 
race/ethnic group (White, Black, and His- 
panic) with each of the three grade cohorts. 



TABLE 2 

School- and student-level characteristics: \\ 'eighted averages by race/ethnic group by school type- 
Grade 4 



ERLC 





White MCT 


Black 


MCT 


Hispanic MCT 


Variable 


Yes 


No 


Yes 


No 


Yes 


No 


School level 














% White students 














M 


77 


91 


35 


37 


49 


62 


SD 


21 


12 


30 


38 


27 


36 


% of students/free lunch 














M 


39 


39 


64 


62 


46 


48 


SD 


32 


34 


32 


30 


33 


37 


Instructional $ per pupil 














M 


58 


49 


58 


63 


60 


50 


SD 


23 


23 


22 


15 


22 


28 


Student level 














Family background 3 














A/ 


6.07 


6.21 


5.71 


5.82 


5.42 


5.47 


SD 


2.14 


2.02 . 


2.17 


2.10 


2.28 


2.11 


Students' academic behaviors 














\f 


3.44 


3.34 


3.46 


3.26 


3.36 


3.21 


SD 


1,70 


1.67 


1 .75 


1.80 


1.62 


1.61 


Student's age 














M 


9.25 


9.29 


9.34 


9.45 


9.38 


9.36 


SD 


.52 


.51 


.61 


.67 


.65 


.61 


Sole. MCT *= Minimum Competency 


Test. 












'Composite of parents' education plus 


possessions 


m the home 
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TABLE 3 

ScIuh)I- and student-level characteristics: Weighted averages by race/ethnic group by school type— 
Grade 8 



Variable 


White MCT 


Black MCT 


Hispanic MCT 


Yes 


No 


Yes 


No 


Yes 


No 


School level 














c wniic siuacnis 














M 


76 


89 


37 


43 


39 


39 




22 


16 


31 


34 


27 


26 


c oi siucjen is/ iree lumn 














\* 

M 


23 


41 


49 


58 


44 


61 




24 


30 


28 


33 


27 


30 


Instructional $ per pupil 














\f 


63 


53 


65 


55 


66 


48 


SD 


15 


13 


16 


11 


14 


1 1 


Student level 














Family background 3 














\f ' 


7.23 


7.03 


6.39 


6.27 


5.70 


5.24 


SD 


1.62 


1.57 


1.87 


1.95 


2.03 


2.04 


Students' academic behaviors 














\f 


3.63 


3.58 


3.53 


3.48 


3.56 


3.33 


SD 


1.90 


1.68 


1.75 


1.70 


1.77 


1.70 


Students* age 














M 


13.3 


13.3 


13.4 


13.7 


13.4 


13.6 


SD 


.52 


.53 


.73 


.90 


.57 


.70 



\<ue. MCT = Minimum Competency Test. 

1 Composite of parents' education plus possessions in the home. 



This model tests the assumption that the 
within-group regression coefficients are ho- 
mogeneous for schools with and without 
MCT programs and that one may test dif- 
ferences between groups after adjusting for 
the effects of other attributes. There were 
nine parallel regression equations, each in- 
corporating the same predictor variables. In- 
dividual reading proficiencies were used as 
dependent measures. In accordance with 
suggested NAEP procedures, each "plausible 
value* w'as used as a dependent measure in 
a regression analysis. Thus, there were five 
regression equations for each of the nine 
race/grade groups, or 45 regression equa- 
tions. The five resulting regression coeffi- 
cients for MCT for each race/grade were 
then averaged to arrive at the reported effect. 
Standard errors were adjusted to reflect the 
variance due to uncertainty in these values 
and due to sampling. 

Covariates at the student level were age. 
sex. family background, and students' aca- 
demic behaviors. Covariates included at the 



school level were region of the country, per- 
centage of students on free lunch, and school 
racial composition. The school-level vari- 
ables for MCT and remedial program were 
each dummy coded — 1 = yes and 0 = no — 
and remedial program and instructional dol- 
lars per pupil were included as two potential 
explanatory variables. 

The interactions between the MCT vari- 
able and student age. school-level SES, re- 
gion. MCT. percentage of students on free 
lunch, percentage of White students, family 
background, and students 1 academic behav- 
iors were tested as a block, entered last, and 
were found to be nonsignificant. All regres- 
sion analyses were conducted on students in 
the grade samples rather than age samples. 
Listwise deletion of missing cases was used 
in all analyses. Because the NAEP sample 
design employs stratifications and clustering 
(students within schools, schools within pri- 
mary sampling units), the resulting sample 
has different statistical characteristics from 
those of a simple random sample. To ac- 
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TABLE 4 

School- and student-level characteristics: W eighted averages by race/ethnic group by school type- 
Grade J I 







White MCT 




Black MCT 




Hispanic 


MCT 


Variable 




Yes 


No 




Yes 


No 




Yes 


NO 


School le\el 




















% White students 




















.1/ 




79 


88 




45 


62 




35 


55 


SD 




19 


15 




32 


2 1 




29 


31 


% of studcnts/frce lunch 




















.V/ 




15 






38 






34 


40 


SD 




16 


20 




27 


25 




23 


25 


Instructional $ per pupil 




















M 




61 


57 




65 


50 




62 


59 


SD 




17 


14 




1 5 


19 




16 


1 1 


Student level 




















Family background 3 




















M 




7.60 


7,25 




6.72 


6.40 




5.91 


5.83 


SD 




1.47 


1.41 




1.66 


1.62 




1.87 


1 .71 


Students' academic behaviors 


















,1/ 




4.30 


3.93 




4.12 


3.86 




3.93 


3.72 


SD 




2.06 


1.94 




1.91 


1.94 




1.93 


1.93 


Students' age 




















M 




17.0 


17.0 




17.3 


17.3 




17.3 


17.4 


SD 




.60 


.50 




.80 


.80 




.80 


.70 


tXote. MCT - Minimum Competency Test. 
















'Composite of parents" education plus possessions in the home. 












TABLE 5 




















Unadjusted average reading proficiency by grade bv race/ethnic qroup by school type 




Grade 4 MCT 


Grade 


8 MCT 


Grade 1 1 


MCT 


Race/ethnic 




















group Yes 


No 


No resp. 


Yes 


No No resp. 


Yes 


No 


No resp. 


White 




















M 220.0 


227.0 


219.6 


267.7 


260.6 


266.9 2 


98.1 


290.8 


291.1 


SD 31.5 


30.5 


32.7 


27.2 


28 


.5 


27.7 


29.9 


31.5 


32.3 


Black 




















.1/ 194.9 


194.4 


189.0 


243.9 


232 


.3 


239.5 2 


67.7 


262.7 


261.6 


SD 28.3 


28.6 


30.5 


27.2 


27.8 


28.0 


28.2 


31.5 


31.3 


Hispanic 




















.1/ 197.9 


202.3 


194.9 


243.7 


236.4 


244.1 2 


69.0 


261.9 


263.8 


SD 31.1 


29.8 


32.7 


29.8 


28 


.5 


27.2 


32.2 


34.4 


34.0 



Note. MCT = Minimum Competency Test: resp. » response. 



count approximately for the effects of the 
sample design, a design effect of two was 
used. Kish and Frankel (1974) suggest that 
design effects for complex statistics from 
complex samples are greater than 1. This 
has the effect of dividing the sample size in 
half and using the adjusted sample size in 
the computation of errors. This method was 
used in lieu of the Educational Testing Serv- 
ice jacknife technique employed in estimat- 



ing sampling variability of statistics included 
in official NAEP reports. (See Johnson, 
1987, for a discussion of design effects used 
to adjust error estimates when using NAEP 
data.) 

Results 

The covariate-adjusted contrasts between 
MCT and non-MCT schools resulting from 
the regressions of reading proficiencies on 
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school- and student-level variables for each 
race/grade cohort are presented in Table 6. 
For the sake of brevity, the term effect is 
used in discussing the relationships identi- 
fied between MCT programs and achieve- 
ment although the findings arc correlational 
and not necessarily causal. The first column 
presents the effect of MCT after adjusting 
for sex. age. region of the country, school- 
level SES, family background, and students' 
academic behaviors. The second column 
presents the effect after adjusting for all of 
the previously mentioned variables in addi- 
tion to the explanatory variables, per pupil 
instructional dollars, and school-level re- 
medial reading program. 

Effects adjusted for all variables included 
in the regression analyses for each race/ 
ethnic group by grade level are depicted in 
Figure 1. The complete regression equation 
for each grade group is shown in the Appen- 
dix. 

Grade 4 

At Grade 4. after controlling for all vari- 
ables, there were no significant effects attrib- 
uted to the MCT dummy variable for any 
of the race/ethnic groups. 

Grade S 

At Grade 8. after adjusting for student 
and school-level variables, there were posi- 
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live effects for both White and Black eighth 
graders. This effect represented about an 8- 
point (.29 SD) advantage for Whites and a 
10-point (.38 SD) advantage for Blacks in 
average reading proficiency as compared 
with their respective counterparts in schools 
without MCT programs. Effect sizes indi- 
cated are calculated as the difference be- 
tween treatment and comparison adjusted 
means divided by the standard deviation of 
the comparison group (Glass. 1977). The 
inclusion of remedial program and instruc- 
tional dollars as variables explained part but 
not all of the MCT effect. Inclusion of these 
variables reduced the effect for White stu- 
dents by about 29% and for Black students 
by about 31%. No significant effect was 
isolated for Hispanic students. 

Grade J I 

At Grade 1 1. after controlling tor student- 
and school-level variables, there were posi- 
tive effects for all race/ethnic groups. This 
effect reflected a 2-point (0.6 SD) advantage 
in average reading proficiency for White stu- 
dents attending schools with MCT pro- 
grams, a 7-point (.26 SD) advantage for 
Blacks, and a 6-point (.19 SD) advantage for 
Hispanics as compared with their respective 
counterparts in schools without MCT pro- 
grams. Inclusion of remedial program and 
instructional dollars explained the effect for 



TABLE 6 

Covariatc-adiusled contrasts for MCT (Minimum Competency Test) and nan- MCT schools 



Grade 4 



Grade 8 



Grade I 



group 


Adjusted 3 


Final stcp b 


Adjusted 


Final step 


Adjusted 


Final step 


White 














h 


-1.66 


-1.08 


7.79** 


5.54** 


2.18* 


.42* 


SE 


1.09 


1.08 


1.29 


LSI 


LOI 


1.58 


Black 














h 


0.30 


2.89 


10.90*** 


7.60*** 


6.62*** 


12.34*** 


SE 


2.58 


3.97 


2.51 


3.23 


2.91 


3.48 


Hispanic 














h 


-2.86 


3.00 


-L2I 


0.06 


5.93** 


5.76** 


SE 


3.01 


3.23 


3.50 


3.71 


3.48 


4.33 



1 \i each grade level, effect is adjusted for sex. student age, regions of the country, school-level socioeconomic status. 

family background, and student academic behaviors. 
h At each grade level, effect is adjusted for student- and school-level variables in addition to school-level remedial 

program (dummy coded I = yes. 0 = no) and instructional dollars per pupil. 
*/?<.05. **p<.0\. *•*/><". 001. 
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c 

3 



fa 




■ Black 
H White 
B Hispanic 



Grade level 



FIGURE I. Wiihm-race school-level minimum competency testing program effect in standard deviation 
units and adjusted for control variables. 



White students, accounted for a negligible 
portion of the effect for Hispanic students, 
and caused the effect for Black students to 
become larger. This increase was a case of 
statistical suppression. A specific remedial 
program in the school, although not corre- 
lated with proficiency (r = .01), was corre- 
lated with having an MCT program (r = .70) 
and thus adds irrelevant variance to the vari- 
able MCT and reduces the relationship with 
proficiency. Instructional dollars per pupil, 
although not correlated with proficiency 
(r = .04), was correlated with MCT (/* ~ .2'/ ; 
and acted in a similar manner. The statistical 
explanation is that the inclusion of these two 
variables in the equation suppresses the un- 
wanted variance in reading proficiency and 
increases the relationship between profi- 
ciency and MCT. (For a discussion of 
suppression in complex regression models, 
see Cohen & Cohen, 1983.) Theoretically, 
this finding is troublesome because school- 
level MCT programs are accompanied by 
extra instructional resources and a remedial 
program. We know little concerning imple- 
mentation effects or student achievement 
outcomes attributable to these factors. 

Discussion 

In general, these results suggested a posi- 
tive relationship between school-level MCT 
programs and reading proficiency at the up- 
per grade levels, but not at the elementary 



ERLC 



K 1 



school level. The discussion of why this is 
the case is quite speculative because of the 
nature of the study. 2 

Grade 4 

The failure to find a relationship at fourth 
grade for any of the race/ethnic groups sug- 
gests that there may be little or no advantage 
in implementing MCT programs at this 
grade level. In elementary schools, there is a 
general emphasis on instruction in basic 
skills, particularly reading, so the addition 
of an MCT program may be redundant. 
Because only one time point is being exam- 
ined, the direction of causality between the 
variables cannot be established. Schools with 
MCT had students with lower reading pro- 
ficiency, and perhaps this situation resulted 
in schools' implementing a local MCT pro- 
gram. Additionally, many other important 
variables (e.g., classroom practices, aca- 
demic engaged time, and content covered) 
are critical in explaining students 1 reading 
proficiency but were not included in these 
analyses (Winfield. 1987b). It appears that 
policymakers who advocate testing reforms 
at this level might better use resources to 
strengthen teaching and instruction. 

Grade S 

At eighth grade, a positive relationship 
between school-level MCT programs and 
reading was isolated for White and Black 
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students, but not Hispanic students. One 
possible explanation for finding effects on 
Black and White students is that by restrict- 
ing the variance in background factors to 
within groups, the probability increased of 
demonstrating an effect due to school vari- 
ables. This should also hold for fourth grade, 
but unreliability of student self-report meas- 
ures may be a source of error. Other studies, 
using a similar method, have found that 
among schools attended by students of the 
same SES background or race, it was possi- 
ble to identify some that were consistently 
"effective or ineffective" (Frederiksen, 
1975). 

An alternative explanation is that there 
may be other factors associated with the 
program but not measured (e.g., monitoring 
of student progress) which contribute to this 
effect. The distributions of White and Black 
students' NAEP proficiency levels in schools 
with MCT programs were shifted upward 
for all students, not just for those students 
at the lower reading proficiency levels (Win- 
field, 1987a). Additional descriptive analyses 
of the sample schools indicated a higher 
percentage of students in both gifted and 
remedial reading programs compared to 
schools without MCT programs (Winfield. 
1988). This suggests that the identified 
"MCT effect' 1 may not be due solely to MCT 
but to other school-related conditions and 
characteristics. 

The failure to find an MCT effect on the 
reading proficiency of Hispanic eighth grad- 
ers suggests that the variables included in 
the analysis may be insufficient to explain 
reading proficiency of Hispanic students at 
this grade level. Other variables might influ- 
ence proficiency for these students — for ex- 
ample, language dominance, language spo- 
ken in the home and in peer groups, and 
years of residence in the United States (Or- 
tiz, 1986). Alternatively, the school variables 
included may operate differently in different 
contexts. Hanushek (1970) found differ- 
ences in teachers and classrooms related to 
the achievement of White students but not 
Mexican students. These results suggest the 
need to investigate effects of school-level 



MCT for Hispanic students at this grade 
level. 

Grade 11 

At 1 1th grade, the MCT school effect on 
White students' reading proficiency could be 
statistically explained by the inclusion of 
school-level remedial program and instruc- 
tional dollars per pupil. For Blacks and His- 
panics. this was not the case. Remedial pro- 
gram and instructional dollars were suppres- 
sor variables in the regression equation for 
Blacks. Other research suggests that remedial 
programs for MCT may be less effective in 
facilitating reading achievement for Black 
students than for White students (Serow, 

1984) . However, remedial program in this 
study was a school-level rather than a stu- 
dent-level variable, and therefore it cannot 
adequately address this issue. The results of 
this study are merely suggestive that MCT 
remedial programs may have different ef- 
fects on different groups. 

The positive effects isolated at 1 1th grade 
for each race/ethnic group may be due to 
proximity to graduation. MCT may be more 
meaningful to students at this grade level, 
especially if it must be passed in order to 
graduate. Alternatively, the effects may be 
due to other unmeasured characteristics of 
these particular schools. Information on 
school retention or dropout rates was not 
available, and it may be that schools that 
have institutionalized an MCT program 
have higher levels of dropout among lower 
performing students (Serow & Davies, 
1982). Thus, those students who are doing 
poorly or have failed an MCT may no longer 
be in school at 1 1th grade. In general. Black 
and Hispanic dropout rates are higher than 
those of White students (Plisko & Stern. 

1985) , so the results obtained for the 1 1th 
grade may pertain to a more select popula- 
tion than the populations in the 4th and 8th 
grades. This may be especially true of mi- 
nority groups. Burton and Jones (1982) 
reached a similar conclusion regarding 
NAEP data for 17-year-olds. In a compari- 
son of achievement trends of Black and 
White youth, they suggested that it would 
not be possible to assess whether the relative 
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improvement observed in the Black popu- 
lation at ages 9 and 13 persisted at age 17 
because of the differential dropout rates by 
race and sex. An out-of-school sample of 1 7- 
year-olds is needed to assess the trend (Bur- 
ton & Jones, 1982; Alexander-James. 1987). 
Without adequate dropout statistics for the 
particular sample studied, the issue of sam- 
ple selectivity at 1 1th grade can not be re- 
solved. 

Conclusions and Implications 

Because of the exploratory nature of the 
studies presented here, caution must be ex- 
ercised in generalizing to all schools in the 
nation. The results suggest that relationships 
between MCT programs and student reading 
proficiency outcomes may be varied for dif- 
ferent grades, race/ethnic groups, and types 
of programs. At the elementary level, no 
relationship was isolated for any of the race/ 
ethnic groups studied. At 8th grade, positive 
effects of school-level MCT programs were 
isolated for White and Black students, but 
not Hispanic students. Similarly, at 11th 
grade, positive effects of school-level MCT 
were isolated for each race/ethnic group. 

The remaining discussion will address the 
limitations of the study and how future re- 
search in this area might address some of 
these problems. One issue in conducting 
secondary data analysis is the appropriate- 
ness of data used for addressing questions of 
interest. It is generally the case that most of 
the available large-scale data bases have been 
less than optimal for conducting certain 
types of policy analyses (Plisko, Ginsburg, & 
Chaikind, 1986). The primary purpose of 
NAEP — to serve as "The Nation's Report 
Card" — has led to very careful test construc- 
tion in order to measure changes in student 
learning and educational competence in 
core subject areas. Because the major goal 
of NAEP is to estimate population and sub- 
population means, each student may answer 
only a few of the large number of assessment 
items, and an individual reading proficiency 
estimate is not attained. The "plausible val- 
ues" used as indicators of individual reading 
proficiency are intermediate steps to yield 
consistent estimates of selected margins of 



the national population — specifically, gen- 
der, ethnicity, parents' education, size and 
type of community, age, region of the coun- 
try, and grade. These are referred to as "con- 
ditioning" variables. Analyses involving any 
other background variables are subject to 
regression effects. Thus, coefficients for 
school variables in this study are underesti- 
mated by about 15%-20% (R. Mislevy, per- 
sonal communication, August 15. 1986), 
and coefficients for conditioning variables 
are inflated. Although the substance of any 
conclusions derived from these studies 
would be essentially unchanged if all biases 
were removed, any effect due to a school- 
level variable is extremely conservative when 
using NAEP data. Refinements in the tech- 
nology of NAEP since the time of the current 
study have minimized this problem by con- 
ditioning on a larger number of background 
variables, for example, as in the NAEP study 
of young adult literacy (Kirsch & Jungeblut, 
1986). In the future, secondary bias in NAEP 
data may be eliminated altogether by con- 
ditioning on well-chosen linear combina- 
tions of large numbers of variables (Mislevy, 
1988). 

The collection of school variables has im- 
proved the value of the assessment as a 
policy research tool. However, for investi- 
gating relationships between MCT programs 
and reading proficiency, future large-scale 
surveys — whether NAEP or another sur- 
vey — might collect additional details on the 
exact nature of the school-level program. At 
a minimum, it is important to know the 
purpose, special personnel or curriculum 
used, nature of the state mandate, the pro- 
portion of students in the schools who failed 
to meet requirements, and school-level re- 
tention and dropout rates. This information 
would permit ruling out some of the plau- 
sible rival hypotheses suggested for results 
reported in this study and would permit a 
more precise categorization of MCT pro- 
grams. For example, whether a program is 
used for remediation funding or promotion/ 
graduation might differentially influence 
student reading proficiency. The use of prin- 
cipals' self-reports concerning whether there 
is a school-level MCT program is admittedly 
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limited in defining the nature, implementa- 
tion, and practices affecting student out- 
comes in reading. 

Any explanation of student outcomes 
needs to consider teachers and classroom 
practices, as well as school variables. The 
effects of these variables are subsumed some- 
what in an overall school effect; however, 
teacher or classroom-level variables deserve 
direct examination. These were not within 
the scope of the study reported here and 
would be important to include in future 
studies. Also, information on student partic- 
ipation in MCT and remedial programs 
should be collected at the individual student 
level rather than at the school level. In con- 
ducting multilevel analyses of school data, it 
is known that many of the assumptions of 
using aggregated data do not reflect reality. 
For example, school-level aggregates are as- 
sumed to affect all students equally, but 
pupils receive differential exposure to school 
resources and facilities. This may be espe- 
cially true with respect to low-achieving stu- 
dents who are more likely to participate in 
MCT-related remediation and be placed in 
lower-tracked classes (Braddock, 1990; Lee 
& Bryk, 1988). Other individual student- 
level data, not available in NAEP, include a 
measure of SES such as parents* occupa- 
tional status. Future analyses should include 
this measure as a covariate to adjust for 
student background. 

Large-scale, cross-sectional data as used in 
this study do not permit one to infer the 
direction of causality or to understand the 
nature of the process. In order to assess 
change in students' reading proficiency that 
may result from implementation of an MCT 
program, data must be available for two or 
more time periods, and preferably for co- 
horts of students. Moreover, school-level 
MCT program was used as a proxy variable 
for conditions within schools which influ- 
ence student reading outcomes. The process 
variables, interactions, and local adaptations 
within schools which impact student out- 
come — such as teachers* expectations, re- 
source allocation and use, and students* op- 
portunity to learn — might be more appro- 
priately studied by using qualitative 

Office of 

Research in 



methods. (See Oakes, 1989, for a discussion 
of educational indicators and school con- 
text.) 

The diversity of local and state policies 
related to MCT makes it extremely difficult 
to characterize precisely the nature and out- 
comes of the testing reforms. The NAEP 
data were not designed to specifically ad- 
dress such issues; however, this study dem- 
onstrates both the limits and potential of 
using redesigned NAEP data to explore the 
relation between testing reforms and 
achievement from a national perspective. 
For policymakers interested in the use of 
testing to improve schools, there is modest 
evidence that a positive relationship exists, 
at least at the middle and secondary school 
level. We know, however, that any reform 
operates on the levels of policy, administra- 
tion, and practice — each with its own re- 
wards, incentives and limitations (Elmore & 
McLaughlin, 1988). To the extent that 
NAEP in its redesign can inform how teach- 
ers teach and how school organization af- 
fects practice in schools implementing MCT 
programs, we may begin to understand how 
this reform has its impact on student 
achievement outcomes. In the future, NAEP 
will conduct a state-by-state assessment of 
reading and mathematics achievement. The 
present study, with its noted limitations, 
foreshadows some of the problems inherent 
in isolating achievement effects that may 
result from various school reforms. 

Notes 

1 The use of hierarchical liner modeling (Rau- 
denbush & Bryk, 1988-1989) provides an opti- 
mal method for analyzing data from studies of 
school and classroom effects. The NAEP samples 
too few students within each school to provide a 
stable estimate. 

2 The author is aware that the implications of 
the study for policy are limited due to the explor- 
atory nature of the study and use of the redesigned 
NAEP data. In attempting to arrive at an estimate 
of the relationship between testing reforms in the 
nation and student reading achievement by using 
NAEP data, one misses the local context and 
implementation effects — factors that are closely 
linked to student reading achievement outcomes. 
Local school- or district-level data my be more 
appropriate for developing policies regarding 
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school organization and achievement outcomes. 
The policy dynamics of school reform, as well as 
.hanging schools, are highly complex (Timar & 
kirp, 1989). Reanalyses of NAEP data, however, 
may be informative in describing existing rela- 
tionships among school policy, teacher behaviors. 
And student achievement outcomes. 
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Achievement Effects of Ability Grouping in 
Secondary Schools: A Best-Evidence Synthesis 

Robert E. Slavin 

Johns Hopkins University 

This article reviews research on the effects of ability grouping on the achievement 
of secondary students. Six randomized experiments, 9 matched experiments, and 
14 correlational studies compared ability grouping to heterogeneous plans over 
periods of from one semester to 5 years. Overall achievement effects were found to 
be essentially zero at all grade levels, although there is much more evidence 
regarding Grades 7-9 than 10-12. Results were similar for all subjects except social 
studies, for which there was a trend favoring heterogeneous placement. Results were 
close to zero for students of all levels of prior performance. This finding contrasts 
with those of studies comparing the achievement of students in different tracks, 
which generally find positive effects of ability grouping for high achievers and 
negative effects for low achievers, andthese contrasting findings are reconciled. 

For more than 70 years, ability grouping has been one of the most controversial 
issues in education. Its effects, particularly on student achievement, have been 
extensively studied over that time period, and many reviews of the literature have 
been written. In recent years, a comprehensive review of the achievement effects 
of ability grouping in elementary schools has been published by Slavin (1987), but 
only brief meta-analyses by Kulik and Kulik (1982, 1987) have reviewed the 
evidence on ability grouping and heterogeneous placement in secondary schools. 

The purpose of this paper is to present a comprehensive review of all research 
published in English that has evaluated the effects of ability grouping on student 
achievement in secondary schools. Secondary' schools are defined here as middle, 
junior, or senior high schools in the United States, or similarly configured secondary 
schools in other countries. Secondary schools can include grades as low as five, but 
they usually begin with sixth or seventh grades. Ability grouping is defined as any 
school or classroom organization plan that is intended to reduce the heterogeneity 
of instructional groups; in between-class ability grouping the heterogeneity of each 
class for a given subject is reduced, and in within-class ability grouping the 
heterogeneity of groups within the class (e.g., reading groups) is reduced. 

Unlike the situation in elementary schools, the type of ability grouping used in 
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secondary schools is overwhelmingly between-class grouping (McPartland, Coldi- 
ron, & Braddock, 1987). Several closely related forms of ability grouping are used. 
Sometimes students are assigned on the basis of some combination of composite 
achievement, IQ, and teacher judgments to a track, within which all courses are 
taken. For example, senior high school students are often assigned to academic, 
general, and vocational tracks; middle/junior high school students are often as- 
signed to advanced, basic, and remedial tracks (in either case, the number of tracks 
and the names used to describe them vary widely). This type of grouping plan is 
generally called tracking in the United States or streaming in Europe. It is an 
example of what Slavin ( i 987) called "ability-grouped class assignment." In addition 
to assignment to higher and lower sections of the same courses, tracking in senior 
high schools usually involves different courses or course requirements. For example, 
a student in the academic track may be required to take more years of mathematics 
than a student in the general track, or may take French III rather than metal shop. 

A particular form of tracking often seen in middle/junior high schools is block 
scheduling, where students spend all or most of the day with one homogeneous 
group of students. Some schools rank-order students from top to bottom and assign 
them to, say, 7-1, 7-2, 7-3, and so on. Many senior high schools allow students to 
choose their track or to choose the level they wish to take in each subject, but in 
plans of this kind counselors tend to steer students into the level of classes to which 
they would have been assigned if the school were not allowing students a choice 
(Rosenbaum, 1978). 

Another form of ability grouping common in secondary schools involves assign- 
ing students to ability-grouped classes for all academic subjects, but allowing for 
the possibility that students will be placed in a high-ranking group for one subject 
and a low-ranking group for another. In practice, scheduling constraints often make 
this type of grouping similar to plans in which all courses are taken within the same 
track. In some cases schools ability group for some subjects and not for others; for 
example, students may be in ability-grouped math and English classes but in 
heterogeneous social studies and science classes. Ability grouping usually involves 
higher and lower sections of the same course, but sometimes consists of assignment 
to completely different courses, as when ninth graders are assigned either to Algebra 
I or to general math. When high achievers are assigned to markedly different 
courses usually offered to older students (as when seventh graders take algebra), 
this is called acceleration. More commonly, high achievers may be assigned to 
"honors" or "advanced placement" sections of a given course, and low achievers 
may be assigned to special "remedial" sections. 

Although between-class ability grouping is by far the most common type of 
ability grouping in secondary schools, forms of within-class grouping are also 
occasionally seen. These are plans in which students are assigned to homogeneous 
instructional groups within their classes. Within-class ability grouping, such as use 
of reading or math groups, is the most common form of grouping at the elementary 
level (McPartland et al., 1987). Complex plans, such as those that involve grouping 
across grade lines, flexible grouping for particular topics, and part-time grouping, 
are also occasionally seen in secondary schools. In general, a wider range of grouping 
plans are used in middle/junior high schools than in senior high schools. 

Arguments for and against ability grouping have been essentially similar for 70 
years. For example, Turney (1931), summarizing writings of the 1920s, listed 
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advantages and disadvantages of ability grouping. The advantages were as follows: 

1. It permits pupils to make progress commensurate with their abilities. 

2. It makes possible an adaption of the technique of instruction to the needs of 
the group. 

3. It reduces failures. 

4. It helps to maintain interest and incentive, because bright students are not 
bored by the participation of the dull. 

5. Slower pupils participate more when not eclipsed by those much brighter. 

6. It makes teaching easier. 

7. It makes possible individual instruction to small slow groups. 
The following were the disadvantages: 

1 . Slow pupils need the presence of the able students to stimulate them and 
encourage them. 

2. A stigma is attached to low sections, operating to discourage the pupils in 
these sections. 

3. Teachers are unable, or do not have time, to differentiate the work for different 
levels of ability. 

4. Teachers object to the slower groups. 

A research symposium, school board meeting, or PTA meeting on the topic of 
ability grouping in 1990 is likely to bring up much the same arguments on both 
sides, with two important additions: the argument that ability grouping discrimi- 
nates against minority and lower-class students (e.g., Braddock, 1990; Rosenbaum, 
1976), and the argument that students in the low tracks receive a lower pace and 
lower quality of instruction than do students in the higher tracks (e.g., Gamoran, 
1989;Oakes, 1985). 

In essence, the argument in favor of ability grouping is that it will allow teachers 
to adapt instruction to the needs of a diverse student body and give them an 
opportunity to provide more difficult material to high achievers and more support 
to low achievers. The challenge and stimulation of other high achievers are believed 
to be beneficial to high achievers (see Feldhusen, 1989). Arguments opposed to 
ability grouping focus primarily on the perceived damage to low achievers, who 
receive a slower pace and lower quality of instruction, have teachers who are less 
experienced or able and who do not want to teach low-track classes, face low 
expectations for performance, and have few positive behavioral models (e.g., 
Gamoran, 1989; Oakes, 1985; Persell, 1977; Rosenbaum, 1980). Because of the 
demoralization, low expectations, and poor behavioral models, students in the low 
tracks are believed to be more prone to delinquency, absenteeism, dropout, and 
other social problems (Crespo & Michelna, 1981; Wiatrowski, Hansell, Massey, & 
Wilson, 1982). With few college-bound peers, students in low tracks have been 
found to be less likely to attend college than other students (Gamoran, 1987). 
Ability grouping is perceived to perpetuate social class and racial inequities because 
lower class and minority students are disproportionate represented in the lower 
tracks. Ability grouping is often considered to be a major factor in the development 
of elite and under-class groups in society (Persell, 1977; Rosenbaum, 1980). Perhaps 
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most important, tracking is believed to work against egalitarian, democratic ideals 
by sorting students into categories from which escape is difficult or impossible. 

There are important differences between the pro-grouping and anti-grouping 
positions that go beyond the arguments themselves. Arguments in favor of ability 
grouping focus on effectiveness, saying in effect that as distasteful as grouping may 
be, it so enhances the learning of students (particularly but not only high achievers) 
that its use is necessary. In contrast, arguments opposed to grouping focus at least 
as much on equity as on effectiveness and on democratic values as much as on 
outcomes. In one sense, then, the burden of proof is on those who favor grouping, 
for if grouping is not found to be clearly more effective than heterogeneous 
placement, none of the pro-grouping arguments apply. The same is not true of 
anti-grouping arguments, which provide a rationale for abolishing grouping that 
would be plausible even if grouping were found to have no adverse effect on 
achievement. 

Research on the achievement effects of ability grouping has taken two broad 
forms. One type of research compares the achievement gains of students who are 
in one or another form of grouping to those of students in ungrouped, heterogeneous 
placements. Another type of research compares the achievement gains made by 
students in high-ability groups to t those made by students in the low groups. 

Reviews of the grouping versuii nongrouping literature have consistently shown 
that grouping has little or no impact on overall student achievement in elementary 
and secondary schools (e.g., Borg, 1965; Esposito, 1973; Findley & Bryan, 1971; 
Good & Marshall, 1984; Heathers, 1969; Kulik & Kuiik 198?.). Primarily on the 
basis of his own empirical research, Borg (1965) claimed that ability grouping had 
a slight positive effect on the achievement of high achievers and a slight negative 
effect on low achievers, but Kulik and Kulik (1987) found no such trend. 

In contrast, researchers who have compared gains made by students in different 
tracks have generally concluded that controlling for ability level, socioeconomic 
status, and other control variables, being in the top track accelerates achievement 
and being in the low track significantly reduces achievement (Alexander, Cook, & 
McDill, 1978; Dar & Resh, 1986; Gamoran & Berends, 1987; Gamoran & Mare, 
1989; Oakes, 1982; Persell, 1977; Sorensen & Hallinan, 1986). In fact, many 
researchers and theorists in the sociological tradition maintain that tracking is a 
principal source of social inequality in society and that it causes or greatly magnifies 
differences along lines of class and ethnicity (e.g., Braddock, 1990; Jones, Erickson, 
& Crowell, 1972; Schafer & Olexa, 1971; Vanfossen, Jones, & Spade, 1987). 

One area of research has investigated the quality of instruction offered to students 
in high- and low-ability groups, usually concluding that low-ability group classes 
receive instruction that is significantly lower in quality than that received by 
students in high-track classes (e.g., Evertson, 1982; Gamoran, 1989; Oakes, 1985; 
Trimble & Sinclair, 1987). However, it is difficult to compare "quality of instruc- 
tion" in high- and low-track classes. For example, teachers typically cover less 
material in low-track classes (e.g., Oakes, 1985). Is this an indication of poor quality 
of instruction or an appropriate pace of instruction? Students in low-track classes 
are more off-task than those in high-track classes (e.g., Evertson, 1982). Is this due 
to the poor behavioral models and low expectations in the low-track classes, or 
would low achievers be more off-task than high achievers in any grouping arrange- 
ment? Evidence that low-track classes are often taught by less experienced or less 
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qualified teachers or that they manif st other objective indicators of lower-quality 
instruction could justify the conclusion that regardless of measurable effects on 
learning, students in the lower tracks do not receive equal treatment, but such 
evidence is rare. 

In addition to synthesizing research on overall effects of ability grouping on the 
achievement of high-average- and low-achieving secondary students, this review 
will attempt to reconcile research comparing achievement gains in different tracks 
with research comparing grouped and ungrouped settings. 

Review Methods 

This review uses a procedure called "best-evidence synthesis" (Slavin, 1986), 
which incorporates the best features of meta-analytic and traditional reviews. Best- 
evidence syntheses specify clear, well-justified methological and substantive criteria 
for inclusion of studies in the main review and describe individual studies and 
critical research issues in the depth typical of good-quality narrative reviews. 
However, whenever possible, effect sizes are used to characterize study outcomes, 
as in meta-analyses (Glass, McGaw, & Smith, 1981). Systematic literature search 
procedures, also characteristic of meta-analysis, are similarly applied in best- 
evidence syntheses. 

Criteria for Study Inclusion 

The studies on which this review is based had to meet a set of a priori criteria 
with respect to relevance to the topic and methodological adequacy. First, all studies 
had to involve comprehensive ability grouping plans that incorporated most or all 
students in the school. This excludes studies of special programs for the gifted or 
other high achievers as well as studies of special education, remedial programs, or 
other special programs for low achievers. Studies of within-class ability grouping 
are included, but studies of such grouping-related programs as individualized 
instruction, mastery learning, cooperative learning, and continuous-progress group- 
ings are excluded. 

Studies had to be available in English, but otherwise no restrictions were placed 
on study location or year of publication. Every attempt was made to locate 
dissertations and other unpublished documents in addition to the published liter- 
ature. , . 

Methodological requirements for inclusion. Criteria for inclusion of studies in the 
main review were essentially identical to those used in an earlier review of 
elementary ability grouping (Slavin, 1987). These were as follows: 

1. Ability-grouped classes were compared to heterogeneously grouped classes. 
This requirement excluded a few studies that correlated "degree of heterogeneity" 
with achievement gain (e.g., Millman & Johnson, 1964; Wilcox, 1963). Studies 
that compared achievement gains for students in different tracks but not to 
heterogeneous classes (e.g., Alexander et al., 1978) were excluded from the main 
review but are discussed in a separate section. 

2. Achievement data from standardized or teacher-made tests were presented. 
This excluded many anecdotal reports and studies that used grades as the dependent 
measure. Teacher-made tests, used in a very small number of studies, were accepted 
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only if there was evidence that they were designed to assess objectives taught in all 
classes. 

3. Initial comparability of samples was established by use of random assignment 
or matching of students or classes. When individual students in intact schools or 
classes were matched, evidence had to be presented that the intact groups were 
comparable. 

4. Ability grouping had to be in place for at least a semester. 

5. At least three ability-grouped and three control classes were involved. 

The criteria outlined above excluded very few studies comparing comprehensive 
ability grouping plans to heterogeneous placements. Every study located that 
satisfied criteria 1, 2, and 3 also satisfied criteria 4 and 5. Excluding studies of 
special programs for high achievers (e.g., Atkinson & O'Connor, 1963), all but two 
of the studies included in meta-analyses by Kulik and Kulik (1982, 1987) were also 
included in the present review. The exceptions were a study by Adamson (1971) 
that had substantial IQ differences favoring the ability-grouped school and one by 
Wilcox (1963) that compared more and less heterogeneous tracked classes. 

One major category of studies included in the present review but excluded by 
the Kuliks includes studies that did not present data from which effect scores could 
be computed (e.g., Borg, 1965; Ferri, 1971; Lovell, 1960; Postlethwaite & Denton, 
1978). These studies are discussed in terms of the direction and statistical signifi- 
cance of their findings. 

Literature Search Procedures 

The studies included here were located in an extensive search. Principal sources 
included the Education Resources Information Center (ERIC), Dissertation Ab- 
stracts International, and citations made in other reviews, meta-analyses, and 
primary sources. Every attempt was made to obtain a complete set of published 
and unpublished studies that met the criteria outlined above. 

Computation of Effect Sizes 

Effect sizes were generally computed as the difference between the experimental 
and control means divided by the control group's standard deviation (Glass et al„ 
1981). In the ability grouping literature, the heterogeneous group is almost always 
considered the control group, and this convention is followed in the present article; 
positive effect sizes are ones that favored ability grouping, whereas negative effect 
sizes indicated higher means in the heterogeneous groups. The standard deviation 
of the heterogeneous group is also preferred as the denominator because of the 
possibility that ability grouping may alter the distribution of scores. However, when 
means or standard deviations were omitted in studies that otherwise met the 
inclusion criteria, effect sizes were estimated when possible from /s, Fs, exact p 
values, sums of squares in factorial designs, or other information, following proce- 
dures described by Glass et al. (1981). 

Several of the studies included in this review presented data comparing gain 
scores without reporting actual pre- or posttest means. Standard deviations of gain 
scores are typically lower than those of raw scores (to the degree that pre-post 
correlations exceed +0.5), so effect sizes computed on gain scores are often inflated. 
If pre-post correlations are known, effect sizes from all scores can be transformed 
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to the scale of posttest values. However, because none of the studies using gain 
scores also provided pre-post correlations, a pre-post correlation of +0.8 was 
assumed (following Slavin. 1987). Using a formula from Glass et ah (1981), this 
correlation produces a multiplier of 0.632, which was used to deflate effect size 
estimates from gain score data. The purpose of this procedure and others was to 
attempt to put all effect size estimates in the same metric, the unadjusted standard 
deviation of the heterogeneous classes. However, because this multiplier is only a 
rough approximation, effect sizes from studies using gain scores should be inter- 
preted with even more caution than that which is warranted for effect sizes in 
general. 

Another deviation from usual meta-analytic procedure used in the present view 
involved adjustments of posttest scores for any pretest differences. This was done 
either by subtracting pretest means from posttests (if the same tests were used), by 
converting pre- and posttest means to r scores and then subtracting (if different 
tests were used), or by using covariance-adjusted scores. However, even when such 
adjustments were made, affecting the numerator of the effect size formula, the 
denominator remained the unadjusted posttest standard deviation. 

One effect size is reported for each study (see Bangert-Drowns, 1986). When 
multiple subsamples. subjects, or tests were used, medians were computed across 
the data points. For example, if four measures were used with three subgroups (e.g., 
high, middle, and low achievers), the effect size for the study as a whole would be 
the median of the 12 (4 x 3) resulting effect sizes. Whenever possible, findings were 
also broken down by achievement level (high, average, low), and separate effect 
sizes w- re computed for each major subject. 

In pooling findings across studies, medians rather than means were used, prin- 
cipally to avoid giving too much weight to outliers. However, any measure of 
central tendency in a meta-analysis or best-evidence synthesis should be interpreted 
in light of the quality and consistency of the studies from which it was derived, not 
as a finding in its own right. 



A total of 29 studies of tracking or streaming in secondary schools inet the 
inclusion criteria listed earlier, The studies, their major characteristics, and their 
findings are listed in Table 1. 

The studies listed in Table 1 are organized in three categories according to their 
research designs. Six studies used random assignment of students to ability-grouped 
or heterogeneous classes. Nine studies took groups of students; matched them 
individually on IQ, composite achievement, and other measures; and then assigned 
one of each matched pair of students to an ability-grouped class and one to a 
heterogeneous class. The quality of these randomized or matched experimental 
designs is very high, and the findings of the 15 studies using such designs must be 
given special weight. The remaining 14 studies investigated existing schools or 
classrooms that used or did not use ability grouping, and then either selected 
matched groups of students from within each type of school or used analyses of 
covariance or other statistical procedures' to equate the groups. The difficulty 
inherent in such designs is that any differences between schools that are syste^at 
ically related to ability grouping would be confounded with the practice of ability 
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grouping per se. For example, a secondary school that used heterogeneous grouping 
might have a staff, principal, or community more concerned about equity, affective 
development, or other goals than would a "matched" school that used ability 
grouping. However, several of the correlation studies used very large samples and 
longitudinal designs, and these provide important additional information not 
obtainable from the typically smaller and shorter experimental studies. 

Within each category, studies are listed in descending order of sample size. All 
other things being equal, therefore, studies near the top of Table 1 should be 
considered as better evidence of the effects of ability grouping that studies near the 
end of the table. However, the nature and quality of the studies are discussed in 
more detail in the following sections. 



Across the 29 studies listed in Table 1, the effects of ability grouping on student 
achievement are essentially zero. The median effect size (ES) for the 20 studies 
from which effect sizes could be estimated was -.02, and none of the 9 additional 
studies found statistically significant effects. Counting the studies with nonsignifi- 
cant differences as though they had effect sizes of .00, the median effect size for all 
29 studies would be .00. Results from the 1 5 randomized and matched experimental 
studies were not much different; the median effect size was -.06 for the 13 studies 
from which effect sizes could be estimated. In 9 of these 1 3 studies (including all 5 
of the randomized studies) results favored the heterogeneous groups, but these 
effects are mostly very small. 

There are few consistent patterns in the study findings. Most of the studies 
involved Grades 7-9, with ninth graders sometimes in junior high schools and 
sometimes in senior high schools. No apparent trend is discernible within this 
range. Above the ninth grade the evidence is too sparse for firm conclusions. Lovell 
(1960) found that high achieving tenth graders performed significantly better in 
ability-grouped English classes, but there were no effects in biology or algebra and 
no effects for average or low achievers. In a 4-year study of students in Grades 9- 
12, Borg (1965) found significant positive effects of ability grouping for average 
and low achievers in math but no differences in science or for high achievers. 
Cohorts followed from Grades 7- 1 0 and 8- 1 1 showed no significant differences on 
any measure for any ability level. In contrast, Thompson (1974), in a study of 1 1th 
grade social studies, found the largest effects favoring heterogeneous grouping (ES 
= -.48), whereas Kline (1964), in another 4-year study of students in Grades 9- 
12, found no differences. 

Twelve of the 29 studies tracked students for all subjects according to one 
composite ability or achievement measure. The remaining 17 studies grouped 
students on the basis of performance in one or more specific subjects. However, 
there were no differences in the outcomes of these different forms of ability 
grouping. In addition, there were no consistent patterns in terms of the number of 
ability groups to which students were assigned (the great majority of studies used 
3). Study duration had no apparent impact on outcome. Studies that used adjusted 
gain scores produced the same effects as other studies, and the use of the adjustment 
of gain scores described above made no difference in outcomes. 
There was no discernible pattern of findings with respect to different subjects, 
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with one possible exception. Studies by Marascuilo and McSweeney (1972), 
Thompson (1974), and Fowlkes (1931) found relatively strong effects favoring 
heterogeneous grouping in social studies, and three additional studies by Peterson 
(1966), Martin (1927), and Postlethwaite and Denton (1978) found no differences 
or slight effects in the same direction. This is not enough evidence to conclusively 
point to a positive effect of heterogeneous grouping in social studies, but it is 
important to note that all three of the randomized or matched experimental studies 
found differences in this direction. 

There were no consistent effects according to study location. All four of the 
British studies found no differences between streamed and unstreamed classes; a 
large, longitudinal Swedish study by Svensson (1962), not shown in Table 1 because 
it lacked adequate evidence of initial equality, also found no differences between 
streamed and unstreamed classes. Urban, suburban, and rural schools had similar 
outcomes. The one study that involved large numbers of minority students, a 
randomized experiment in a New York City high school by Ford (1974), found no 
differences between ability-grouped and heterogeneous math classes. 

Studies conducted before 1950 were no more likely than more recent studies to 
find achievement differences. On this topic, it is interesting to note that experimen- 
tal-control studies of ability grouping have not been done in recent years. The only 
study of the 1980s, by Kerckhoff (1986), was done by a sociologist who focused his 
attention on differences between students in different streams. This study is 
described in more detail below. Otherwise, the most recent experimental-control 
comparisons were done in the early 1970s. 

Differential Effects According to Achievement Levels 

One of the most important questions about ability grouping in secondary schools 
concerns the degree to which it differentially affects students at different achieve- 
ment levels. As noted earlier, many researchers and reviewers, particularly those 
working the sociological tradition, have emphasized the relative impact of grouping 
for different groups of students far more than the average effect for all students. 

Twenty-one of the 29 studies presented in Table 1 presented data on the effects 
of ability grouping on students of different ability levels. Most studies divided their 
samples into three categories (high, average, and low achievers), but some used two 
or four categories. 

Across the 15 studies from which effect sizes could be computed, the median 
effect size was +.01 for high achievers, -.08 for average achievers, and -.02 for 
low achievers. Effects of this size are indistinguishable from zero, and if all the 
nonsignificant differences found in studies from which effect sizes could not be 
computed are counted as effect sizes of .00, the median effect size for each level of 
student becomes .00. In addition, only one of seven studies from wnich effect sizes 
could not be computed (Lovell, I960) found significantly positive effects of ability 
grouping for high achievers, and none of these studies found significant effects in 
either direction for average and low achievers. The randomized and matched 
experimental studies provided slightly more support for the idea that ability 
grouping has a differential effect; the median effect sizes for high, average, and low 
achievers were +.05, -.10, and -.06, respectively. It is interesting to note that the 
study by Borg (1965), which is often cited to support the differential effect of ability 
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grouping on students of different ability levels, in fact provides very weak support 
for this phenomenon. Across two measures given to members of four 4-year cohorts 
that principally included sc .ondary years, significant effects favoring ability group- 
ing were found for high achievers in one of ei&ht comparisons, for average achievers 
in three of eight, and for low achievers in one of eight. Only in a cohort that 
included Grades 4 to 7 were there significant effects favoring heterogeneous 
grouping for low achievers. 

It might be expected that differential effects of track placement would build over 
time and that longitudinal studies would show more of a differential impact than 
1-year studies. The one multiyear randomized study, by Marascuilo and Mc- 
Sweeney (1972), did find that over a 2-year period, students in the top social studies 
classes gained slightly more than similar students in heterogeneous classes (ES = 
+. 1 4), whereas middle (ES = -.37), and low (ES = -.43) groups gained significantly 
less than their ungrouped counterparts. However, across seven multiyear correla- 
tional studies of up to 5 years' duration, not one found a clear pattern of differential 
effects. 

A few studies provided additional information on differential effects of ability 
grouping by investigating effects of grouping on high or low achievers only. For 
example, Torgelson ( 1 963) randomly assigned low achieving students in Grades 7- 
9 to homogeneous or heterogeneous classes. Across several performance measures, 
the median effect size was +.13 (noasignificantly favoring ability grouping). Simi- 
larly, Borg and Prpich (1966) randomly assigned low achieving 10th graders to 
ability-grouped or heterogeneous English classes and found that there were no 
differences in one cohort. In a second cohort, differences favoring ability grouping 
on a writing measure were found, but there were no differences on eight other 
measures. 

Studies of ability grouping of high achievers are difficult to distinguish from 
studies of special programs for the gifted. Well-designed studies of programs for the 
gifted generally find few effects of separate programs for high achievers unless the 
programs include acceleration (exposure to material usually taught at a higher 
grade level) (Fox, 1979; Kulik & Kulik, 1984). That is, grouping per se has little 
effect on the achievement of high achievers. An outstanding illustration of this is a 
dissertation by Mikkelson (1962), who randomly assigned high achieving seventh 
and eighth graders to ability-grouped or heterogeneous math classes. The seventh 
grade homogeneous classes were given enrichment, but the eighth graders were 
accelerated, skipping to ninth grade algebra. No effects were found for the seventh 
graders. The accelerated eighth graders, or course, did substantially better than 
similar students who were not accelerated on an algebra test, and they did no worse 
on a test of eighth grade math. 

Taken together, research comparing ability-grouped to heterogeneous placements 
provides little support for the proposition that high achievers gain from grouping 
whereas low achievers lose. However, there is an important limitation to this 
conclusion. In most of the studies that compared tracked to untracked grouping 
plans (including all of the randomized and matched experimental studies), tracked 
students took different levels of the same courses (e.g., high, average, or low sections 
of Algebra i). Yet much of the practical impact of tracking, particularly at the 
senior high school level, is on determining the nature and number of courses taken 
in a given area. The experimental studies do not compare students in Algebra 1 to 
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those in Math 9, or students who take 4 years of math to those who take 2. The 
conclusions drawn in this section are limited, therefore, to the effects of between- 
class grouping within the same courses, and should not be read as indicating a lack 
of differential effects of tracking as it affects course selection and course require- 
ments. 



The studies discussed above and summarized in Table 1 evaluated the most 
common forms of ability grouping in secondary schools— full-time, between-class 
ability grouping for one or more subjects. However, a few studies have evaluated 
other grouping plans. 

The most widely used form of grouping in elementary schools, within-class ability 
grouping, has also been evaluated in a few studies involving middle and junior high 
schools. Campbell (1965) compared the use of three math groups within the class 
to heterogeneous assignment in two Kansas City junior high schools. There were 
no differences between the two programs in achievement. Harrah (1956) compared 
five types of within-class grouping in Grades 7-9 in West Virginia and found ability 
grouping to be no more successful than other grouping methods. Note that these 
findings conflict with those of studies of within-class ability grouping in mathe- 
matics in the upper elementary graces, which tended to support the use of math 
groups (Slavin, 1987). 

Vakos (1969) evaluated the use of a combination of heterogeneous and homo- 
geneous instruction in 11th grade social studies classes in Minneapolis. Students 
were grouped by ability 2 days each week, but heterogeneously grouped the other 
3 days. No achievement differences were found. Zweibelson, Bahnmuller, and 
Lyrnan (1965) evaluated a similar mixed approach to teaching ninth grade social 
studies in New Rochelle, New York, and also found no achievement differences. 
Chiotti (1961) compared a flexible plan for grouping junior high school students 
across grade lines for mathematics to both ability-grouped and heterogeneous 
grouping plans, and again found no differences in achievement. A cross-grade 
grouping arrangement similar to the Joplin Plan (Slavin, 1987) was compared to 
within-class grouping in reading by Chismar (1971) in Grades 4-8. Significantly 
positive effects of this program were found in Grades 4 and 7 but not 5, 6, and 8. 

Reconciling Track/ No-Track and High-Track/ Low-Track Studies 

As noted earlier, two very different traditions of research have dominated research 
on ability grouping. One involves comparisons of ability-grouped tc heterogeneous 
placements. The other involves comparisons of the progress made by students in 
different ability groups or tracks. Whereas there has been little experimental research 
comparing ability-grouped to heterogeneous placements since the early 1970s, 
research comparing the achievement of students in different tracks largely began 
in the 1970s and continues to the present. 

The findings of high-track/low-track studies of ability grouping conflict with 
those emphasized in this review in that they generally find that even after controlling 
for IQ, socioeconomic status, pretests, and other measures, students in high tracks 
gain significantly more in achievement than do students in low tracks, especially 
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in mathematics (see Gamoran & Berends, 1987, for a review). How can these 
findings be reconciled with those of the experimental studies? 

One important difference between experimental and correlational studies of 
ability grouping is that, as mentioned earlier, correlational studies (especially at the 
senior high school level) often include not only the effects of being in a high, 
average, or low class, but also the effects of differential course taking. Students in 
academic tracks may score better than those in general or vocational classes because 
they take more courses or more advanced courses. The experimental studies 
comparing grouped and ungrouped classes are all studies of grouping per se, holding 
course taking and other factors constant. The correlational studies examine tracking 
as it is in practice, where track placement implies differences in course requirements, 
course taking patterns, and so on. Also, experimental track versus no-track studies 
are rare beyond the ninth grade, wiiereas most correlational studies comparing 
students in high versus low tracks involve senior high schools. The lack of track 
versus no-track s .idies at the senior high school level is hardly surprising given the 
nearly universal use of some form of tracking at that level. However, tracking 
usually has a different meaning in senior than in junior high school. Whereas junior 
high school tracking mostly involves different levels of courses (e.g., high English 
vs. low English), senior high tracking is more likely to involve completely different 
patterns of coursework (e.g., metal shop vs. French III). Also, the problem of 
dropouts becomes serious in senior high school; a stiidy of 1 2th graders unavoidably 
excludes the students who may have suffered most from being in the low track and 
left school (see Gamoran, 1987). This could reduce observed differences between 
high- and low-track students. 

There is limited evidence, however, that differences in course taking or grade 
level account for the different conclusions of the track/no-track and high-track/ 
low-track studies. Four-year longitudinal studies in U.S. senior high schools by 
Kline (1964) and Borg (1965) found no differential effects of track placement for 
high, average, and low achievers (as compared to similar students in untracked 
placements). Presumably, course-taking patterns in these senior high school studies 
varied by track. A correlational study by Alexander and Cook (1982) found that 
although taking more courses n senior high school did increase achievement 
(controlling for background factors), different course-taking patterns in different 
tracks did not account for track differences in achievement. Gamoran ( 1 987) found 
that track effects on math and science achievement were explained in part by the 
fact that students in the academic tracks take more math and science courses and, 
in particular, more advanced courses in these areas. However, no such patterns 
were seen on reading, vocabulary, writing, or civics achievement measures. Ga- 
moran noted the difficulty of disentangling track and course taking, which are 
highly correlated in math and science (and, of course, both track and course taking 
are strongly correlated with ability, socioeconomic status, and other factors). It is 
certainly logical to expect correlational studies of senior high school tracking to 
find different effects of different track placements because of different course-taking 
patterns, but because of confounding of tracking, course-taking, and student 
background factors, that is difficult to determine conclusively. 

Another likely explanation for different findings of track/no-track and high- 
track/low-track studies involves the difficulty of statistically controlling for large 
differences. Students in higher tracks tend to achieve at much higher levels than 
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those in lower tracks (both before and after taking secondary courses), and statis- 
tically controlling for these differences is probably not sufficient to completely 
remove the influence of ability or prior performance on later achievement. Further, 
studies in higher tracks are also likely to be higher in such attributes as motivation, 
internal locus of control, academic seif-esteem, and effort, factors that are not likely 
to be controlled in correlational studies. 

To undeistand the difficulty of controlling for large initial differences between 
students, imagine an experiment in which a new instructional method was to be 
evaluated. The experimenter selects a group of students who have high test scores 
and high IQ scores and are nominated by their teachers as being hard working, 
motivated, and college material. This group becomes the experimental group, and 
the remaining students serve as the control group. To control for the differences 
between the groups, prior composite achievement and socioeconomic status are 
used as covariates or control variables. 

In such an experiment, no one would doubt that regardless of the true effective- 
ness of the innovative treatment, the experimental group would score far better 
than the control group, even controlling for prior achievement and socioeconomic 
status. No journal or dissertation committee would accept such a study. Yet this 
"experiment" is essentially what is being done when researchers compare students 
in different tracks. When there are significant pretest differences, use of statistical 
controls through analysis of covariance or regression are considered inadequate to 
equate the groups. Most often, the statistical controls will undercontrol for true 
differences (Lord, 1960; Reichardt, 1979). Yet high- and low-track students usually 
differ in pretests or IQ by one to two standard deviations, an enormous systematic 
difference for which no statistical procedure can adequately control. 

The only study that compared both tracked to untracked schools and high-track 
to low-track students was a 5-year longitudinal study by Kerckhoff (1986) in 
Britain. This study illustrates the problem of controlling for large differences. For 
example, in mathematics, boys in the high track of three-group ability grouping 
programs gained about 1 1 z score points from a test given at age 1 1 to one given 
at age 16, whereas students in a remedial track gained 18 z score points. Yet the 
regression coefficient comparing the high-track to ungrouped students was +2.34, 
indicating performance about 42% of a standard deviation above "predicted" 
performance. In contrast, the remedial-track boys had a regression coefficient (in 
comparison to ungrouped students) of -.72, indicating performance about 1 3% of 
a standard deviation below "predicted" performance, despite the fact that the 
remedial students actually gained more than the top-track students. The reason for 
this is that the remedial students started out (at age 11) scoring 1.64 standard 
deviations below the ungrouped students, whereas top-track students started out 
1 .02 standard deviations above the ungrouped students, a total difference between 
top-track and remedial students of 2.66. No regression or analysis of covariance 
can adequately control for such large pretest differences. Because of unreliability 
in the measur s and less-than-perfect within-group correlations of pre- and posttests, 
"predicted" scores based on pretests and other covariates will (other things being 
equal) be too low for high achievers and too high for low achievers. 

Another factor that can contribute to overestimates of the effects of curriculum 
track on achievement in studies lacking heterogeneous comparison groups is fan 
spread. Put simply, high achi< vers usually gain more per year than do low achievers, 
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so over time the gap between high and low achievers grows. This increasing gap 
cannot be unambiguously ascribed to ability grouping or other school practices, as 
it occurs under virtually all circumstances. A student who is performing at the 16th 
percentile in the 6th grade and is still at the 16th percentile in 12th grade will be 
further "behind" the 12th grade mean in grade equivalents, for example (Coleman 
& Karweit, 1972). 

An additional factor that can contribute to spurious findings indicating a benefit 
of being in the high track is that factors other than test scores factor into placement 
decisions. For example, a study by Balow (1964) found that on math tests not used 
for group placement, there was enormous overlap between students in supposedly 
homogeneous seventh-grade math classes. More than 72% of the students scored 
between the lowest score in the top group and the highest score in the bottom 
group. Among these students in the "area of overlap," students who were in the 
top group gained the most in math achievement over the course of the year, whereas 
those in the low group gained the least. 

On its surface this study provides support to the "self-fulfilling prophecy" 
argument. Yet consider what is going on. Imaging two students with identical 
scores, one assigned to the high group and one to the low group. Why were they so 
assigned? Random error is a possibility but all the systematic possibilities weigh in 
the direction of higher performance for the student assigned to the high group. 
Because teacher judgement was involved, teachers may have accurate knowledge 
of student motivation, self-esteem, behavior, or other factors to enable them to 
predict who will do well and who will not. The actual assignments were done on 
different tests than those used in the Balow study; it is likely that students who 
scored low on Balow's pretests but were put in the high groups scored high on the 
test used for placement, and then regressed to a higher mean on Balow's posttest. 

What this discussion is meant to convey is not that different tracks do or do not 
have a differential impact on student achievement, but that comparisons of students 
in existing tracks cannot tell us one way or another. To learn about the differential 
impacts of track placement, there are two types of research that might be done. 
One would be to randomly assign students at the margin to different tracks, 
something that has never been done. Tbe other is to compare similar students 
randomly assigned to ability-grouped or ungrouped systems. This has been done 
several times, and, as noted earlier in this review, there is no clear trend indicating 
that students in high-track classes learn any more than high achieving students in 
heterogeneous classes, or that students in low-track classes learn any less than low 
achieving students in heterogeneous classes. 

Why Is Ability Grouping Ineffective? 

The evidence summarized in Table 1 and discussed in this review is generally 
consistent with the conclusions of earlier reviews comparing homogeneous and 
heterogeneous grouping (e.g., Kulik & Kulik, 1982, 1987; Noland, 1985), but runs 
counter to two quite different kinds of "common sense." On one hand, it is 
surprising to find that assignment to the low-ability group is not detrimental to 
student learning. A substantial literatu^ has indicated the low quality of instruction 
in low groups (e.g., Evertson, 1982; Gamoran, 1989; Oakes, 1985>,.and a related 
body of research has documented the negative impact of ability grouping on the 
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motivations and self-esteem of students assigned to low groups (e.g., Cottle, 1974; 
Schafer & Olexa, 1971; Trimble & Sinclair, 1987). How can the effect of ability 
grouping on low achieving students be zero, as this review concludes? 

On the other hand, another kind of "common sense" would argue that, at least 
in certain subjects, ability grouping is imperative in secondary schools. How can 
an 8th grade math teacher teach a class composed of students who are fully ready 
for algebra and students who are still not firm in subtraction and multiplication? 
How does an English teacher teach literature and writing to a class in which reading 
levels range from 3rd 10 12th grade? Yet study after study, including randomized 
experiments of a quality rarely seen in educational research, finds no positive effect 
of ability grouping in any subject or at any grade level, even foi the high achievers 
most widely assumed to benefit from grouping. 

The present review cannot provide definitive answers to these questions. How- 
ever, it is worthwhile to speculate on them. 

One possibility is that the standardized tests used in virtually all of the studies 
discussed in this review are too insensitive to pick up effects of grouping. This 
seems particularly plausible in looking at tests of reading, because reading has not 
generally been taught as such in secondary schools. However, standardized tests of 
mathematics do have a great deal of face validity and curricular relevance, and 
these show no more consistent a pattern of outcomes. Marascuilo and McSweeney 
(1972) used both teacher-made and standardized measures of socia 1 studies achieve- 
ment and found similar results with each. 

Another possibility is that it simply does not matter whom students sit next to 
in a secondary class. Secondary teachers use a very narrow range of teaching 
methods, overwhelmingly using some form of lecture or discussion (Goodlad, 
1983). In this setting, the direct impact of students on one another may be minimal. 
If this is so, then any impacts of ability grouping on students would have to be 
mediated by teacher characteristics or behaviors or by student perceptions and 
motivations. 

Studies contrasting teaching behaviors in high- and low-track classes usually find 
that the low tracks have a slower pace of instruction and lower time on-task (e.g., 
Evertson, 1982; Oakes, 1982). Yet, as noted earlier, the meaning and impact of 
these differences are not self-evident. It may be that a slower pace of instruction is 
appropriate with lower-achieving students, or that pace is relatively unimportant 
because a higher pace with lower mastery is essentially equivalent to a lower pace 
with higher mastery. Higher time on-task should certainly be related to higher 
achievement (Brophy & Good, 1986), but the comparisons of time on task between 
high and low tracks are misleading. What would be important to compare is time 
on task for low achievers in homogeneous and heterogeneous classes, because low 
achievers may simply be off-task more than high achievers regardless of their class 
placement. In this regard, it is important to note that Evertson, Sanford, and 
Emmer (1981) found time on-task to be lower in extremely heterogeneous junior 
high school classes than in less heterogeneous one" because teachers had difficulty 
managing the more heterogeneous classes. 

The lesson fo be drawn from research on ability grouping may be that unless 
teaching methods are systematically changed, school organization has little impact 
on student a-hievement. This conclusion would be consistent with the equally 
puzzling finding that substantial reductions in class size have little impact on 
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achievement (Slavin, 1989); if teachers continue to use some form of lecture/ 
discussion/seatwork/quiz, then it may matter very little in the aggregate which or 
how many students the teachers are facing. In contrast, forms of ability grouping 
that were found to make a difference in the upper elementary grades — the Joplin 
Plan (cross-grade grouping in reading to allow for whole-class instruction) and 
within-class grouping in mathematics (Slavin, 1987)— both significantly change 
time allocations and instructional activities within the classroom. 



If the effects of ability grouping on student achievement are zero, then there is 
little reason to maintain the practice. As noted earlier in this article, arguments in 
favor of ability grouping depend on assumptions about the effectiveness of grouping, 
at least for high achievers. In the absence of any evidence of effectiveness, these 
arguments cannot be sustained. 

Yet there is also no evidence that simply moving away from traditional ability 
grouping notices will in itself enhance student achievement, and there are legiti- 
mate cor. ,*rts expressed by teachers and others about the practical difficulties of 
teaching extremely heterogeneous classes as the secondary' level. How can schools 
moving away from traditional ability grouping use this opportunity to contribute 
to student achievement? 

One alternative to ability grouping pften proposed (e.g., Oakes, 1985) is the use 
of cooperative learning methods, which involve students working in small, hetero- 
geneous learning groups. Research on cooperative learning consistently finds posi- 
tive effects of these methods if they incorporate two major elements: group goals 
and individual accountability (Slavin, 1990). That is, the cooperating groups must 
be rewarded or recognized on the basis of the sum or average of individual learning 
performances. Cooperative learning methods of this kind have been used success- 
fully at all grade levels, but there is less research on them in Grades 10-12 than in 
Grades 2-9 (see Newmann & Thompson, 1987). Cooperative learning methods 
have also had consistently positive impacts on such outcomes as self-esteem, race 
relations, acceptance of mainstreamed academically handicapped students, and 
ability to work cooperatively (Slavin, 1990). 

One category of cooperative learning methods may be particularly useful in 
middle schools moving toward heterogeneous class assignments. These methods 
are Cooperative Integrated Reading and Composition (Stevens, Madden, Slavin, & 
Famish, 1987) and Team Assisted Individualization — Mathematics (Slavin & 
Kanveit, 1985; Slavin, Madden, & Leavey, 1984). Both of these methods are 
designed to accommodate a wide range of student performance levels in one 
classroom, using both homogeneous and heterogeneous within-class groupings. 
These programs have been successfully researched in Grades 3-6 but are often used 
up to the eighth grade level. 

Other alternatives to between-class ability grouping have also been found to be 
successful in the upper elementary grades (see Slavin, 1987) and could probably be 
effective in middle schools as well. These include within-class ability grouping in 
mathematics (e.g., teaching two or three math groups within a heterogeneous class) 
and the Joplin Plan in reading. The Joplin Plan involves regrouping students for 
reading across grade levels but according to reading level, so that no within-class 
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reading groups are necessary. However, although these alternatives to between-class 
grouping are promising because of their success in the upper elementary grades, 
the few studies of within-class ability grouping at the junior high school level have 
not found this practice to be effective (Campbell, 1965; Harrah, 1956), and the one 
middle school study of the Joplin Plan found only inconsistent positive effects 
(Chismar, 197 1). (For descriptions of secondary schools implementing alternatives 
to traditional ability grouping, see Slavin, Braddock, Hall, & Petza, 1989.) 

Limitations of This Review 

It is important to note several limitations of the present review. Perhaps the most 
important is that in none of the studies reviewed here were there systematic 
observations made of teaching and learning. Observational studies and outcome 
studies have proceeded on parallel tracks; it would be important to be able to relate 
evidence of outcomes to changes in teacher behaviors or classroom characteristics. 
In particular, it would be important to know the degree to which teachers in ability- 
grouped schools actually differentiate instruction. For example, are teacher of high- 
track classes more likely to provide enrichment (e.g., greater depth on the same 
objectives) or acceleration (e.g., coverage of more material usually taught at a later 
grade level)? How do teachers of low-track classes adapt instruction to the needs of 
their students? How do teachers «f untracked, heterogeneous classes accommodate 
the wide range of performance levels in their classes? What level and pace of 
instruction is provided in untracked, heterogeneous classes? Most important, how 
do variations from teacher to teacher in instructional behaviors in high, low, and 
heterogeneous classes relate to the outcomes of ability grouping for students of 
different ability levels? 

Another limitation, mentioned earlier, is that almost all studies reviewed here 
used standardized tests of unknown relationship to what was actually taught. It 
may be, for example, that positive effects of ability grouping for high achievers 
could be missed by standardized tests because what these students are getting is 
enrichment or higher-order skills not assessed on the standardized measures, or 
that negative effects for low achievers are missed because teachers of low-track 
classes are hammering away at the minimum skills that are assessed on the 
standardized tests but ignoring other content. Future research on ability grouping 
needs to closely examine possible outcomes of grouping on more broadly based 
and sensitive measures. 

A third limitation is the age of most of the studies reviewed. It is possible that 
schools, students, or ability grouping have changed enough since the 1960s and 
1970s to make conclusions from these and oldei studies tenuous. 

As noted earlier, the results reported in this review mainly concern the effects of 
grouping per se, with little regard for the effects of tracking on such factors as 
course taking. Effects of tracking on differential course taking are most important 
in senior high schools. There is a need for additional research comparing tracked 
to untracked situations at the senior high school level, particularly research designed 
to disentangle the effects of tracking from those of differential course taking. 

In addition, it would add greatly to the understanding of ability grouping in 
secondary schools to have evaluations or even descriptions of a wider range of 
alternatives to traditional ability grouping. The few studies of within-class grouping, 
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cross-grade groupings, and flexible grouping plans are not nearly adequate to explore 
alternatives. Cooperative learning, often proposed as an alternative to ability 
grouping, has frequently been found to increase student achievement in ability- 
grouped as well as ungrouped secondary classes (Newmann & Thompson, 1987; 
Slavin. 1990), yet no study has compared cooperative learning in heterogeneous 
classes to traditional instruction in homogeneous ones. Descriptions of creative 
alternatives to ability grouping currently exist only at the anecdotal level (Slavin et 
aL 1989). 

Conclusions 

Although there are limitations to the scope of this review and to the studies on 
which it is based, there are several conclusions that can be advanced with some 
confidence. These are as follows: 

1 . Comprehensive between-class ability grouping plans have little or no effect on 
the achievement of secondary students, at least as measured by standardized tests. 
This conclusion is most strongly supported in Grades 7-9, but the more limited 
evidence that does exist from studies in Grades 10-12 also fails to support any 
effect of ability grouping. 

2. Different forms of ability grouping are equally ineffective. 

3. Ability grouping is equally ineffective in all subjects, except that there may be 
a negative effect of ability grouping in social studies. 

4. Assigning students to different levels of the same course has no consistent 
positive or negative effects on students of high, average, or low ability. 

For the narrow but extremely important purpose of determining the impact of 
ability grouping on standardized achievement measures, the studies reviewed here 
are exemplary. Six randomly assigned individual students to ability-grouped or 
heterogeneous classes, and nine more individually matched students and then 
assigned them to one or the other grouping plan. Many of the studies followed 
students for 2 or more years. If there had been any true effect of ability grouping 
on student achievement, this set of studies would surely have detected it. 

For practitioners, the findings summarized above mean that decisions about 
whether or not to ability group must be made on bases other than likely effects on 
achievement. Given the antidemocratic, antiegalitarian nature of ability grouping, 
the burden of proof should be on those who would group rather than those who 
favor heterogeneous grouping, and in the absence of evidence that grouping is 
beneficial, it is hard to justify continuation of the practice. The possibility that 
students in the low groups are at risk for delinquency, dropout, and other social 
problems (e.g., Rosenbaum, 1980) should also weigh against the use of ability 
grouping. Yet schools and districts moving toward heterogeneous grouping have 
little basis for expecting that abolishing ability grouping will in itself significantly 
accelerate student achievement unless they also undertake changes in curriculum 
or instruction likely to improve actual teaching. 

There is much research still to be done to understand the effects of ability 
grouping in secondary schools on students achievement. Studies using more sensi- 
tive achievement measures, studies of grouping at Grades 10-12, studies of a 
broader range of alternatives to grouping, and studies relating observations to 
outcomes of grouping arc areas of particular need. Enough research has been done 
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comparing the effects of ability grouping on standardized achievement tests for 
students assigned to high, middle, and low tracks, at least up through the ninth 
g- ade. It is time to move beyond these simple comparisons to consider more fully 
now secondary schools can adapt instruction to the needs of a heterogeneous 
student body. 
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THE VARIABLE EFFECTS OF fflGH SCHOOL TRACKING* 



Adam Gamoran 
University of Wisconsin, Madison 



The effects of tracking in high schools depend in part on the way tracking is organized: To 
the extent that the structure of tracking varies across schools, tracking 's impact on achieve- 
ment also varies, I examine four structural characteristics of tracking systems: selectivity, 
electivity, inciusiveness, and scope. I predict that differences in these characteristics lead to 
variation in between- track inequality (the achievement gap between tracks) and school pro- 
ductivity (average achievement of students in the school), net of the composition of the stu- 
dent body. In addition, / hypothesize that Catholic schools have less inequality between 
tracks and higher productivity overall than public schools. I test the hypotheses using data 
from High School and Beyond, a national survey of high schools and (heir students. The 
results show that schools vary significantly in the magnitude of track effects on math achieve- 
ment, and they differ in net average achievement on both math and verbal tests. Schools with 
more mobility in their tracking systems produce higher math achievement overall. They also 
have smaller gaps between tracks in both math and verbal achievement when compared to 
schools with more rigid tracking systems. Moderately inclusive systems also have less be- 
rween-track inequality in math; and overall school achievement tends to rise in both subjects 
as inciusiveness increases. The hypotheses about Catholic schools are also supported, espe- 
cially for math achievement. The way Catholic schools implement tracking partially ac- 
counts for their advantages. 



Many writers have suggested that the ef'ft^s 
of high school tracking on student achieve- 
ment vary among schools, but none has offered a 
compelling theory for why this may occur (Heyns 
1974; Hauser, Sewell. and Alwin 1976; Rosen- 
baum 1984). I use existing knowledge about 
tracking to develop hypotheses for between- 
school differences in tracking's effects. Building 
on the work of Sorensen ( 1 970), I argue that the 
impact of tracking varies according to the struc- 
tural characteristics of school tracking systems. I 
also consider claims that tracking has different 
effects in public and Catholic schools (Gamoran 
and Berends 1987; Page and Valli 1990). I test 
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these hypotheses by applying methods of multi- 
level contextual analysis to data on tracking and 
achievement in a national sample of high schools. 

TRACKING AND STUDENT 
ACHIEVEMENT 

Tracking may affect academic achievement in 
two ways. First, it may affect the dispersion of 
achievement, or educational inequality. Tracking 
adds to inequality when placement in a high-sta- 
tus track permits students to gain more than if 
they had been assigned to a lower track. A key 
question is whether some forms of tracking in- 
duce more inequality between tracks than others. 

Second, the particular structure of tracking may 
influence a school's overall level of achievement. 
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VARIABLE EFFECTS OF HIGH SCHOOL TRACKING 



or educational productivity. Is one type of track- 
ing system more productive than another? If so, 
then variation in the structure of tracking con- 
tributes to between-school variation in achieve- 
ment. 1 A concern with productivity must be paired 
with the study of inequality because if certain 
forms of tracking reduce achievement gaps be- 
tween tracks, it is essential to know whether this 
occurs in a context of higher, lower, or the same 
overall school achievement. 

Does Tracking Affect Inequality 1 ? 

For many years, students, teachers, and field re- 
searchers have reported that more learning oc- 
curs in higher tracks (Hargreaves 1967; 
Rosenbaum 1976; Metz 1978; Ball 1 98 1 ; Bur- 
gess 1984; Oakes 19851 Most survey studies 
corroborate these reports after controlling for stu- 
dents' initial characteristics, including family 
background, race, gender, and prior achievement 
(for a review, see Gamoran and Berends 1987). 
Although the finding has not been universal — 
Jencks and Brown (1975) and / lexander and 
Cook (1982) raised doubts — th; evidence for 
differences between tracks seems persuasive in 
well-specified, carefully controlled analyses us- 
ing national survey data for Britain, Israel, and 
the United States (Kerckhoff 1986; Shavit and 
Feaiherman 1988; Natriello, Pallas, and 
Alexander 1989). Even Slavin ( 1990), who main- 
tained that between-classroom ability grouping 
in American secondary schools has no effects, 
acknowledged that broad curriculum tracking 
probably magnifies inequality in achievement. 
Gamoran and Mare ( 1989) showed that this con- 



1 I focus on variation among types of tracking sys- 
tems, not on the presence or absence of tracking. Al- 
most all American high schools use some form of 
tracking (Oakes. Gamoran, and Page 1 992), and avail- 
able survey data do not readily permit comparisonsof 
tracking to no tracking. Previous work simulated 
changes in inequality and productivity produced by 
ihc hypothetical absence of tracking compared to the 
average tracking system (Gamoran and Marc 1989). 
In that study, all tracking systems were assumed to 
operate similarly, and the question of inequality fo- 
cused on differences between subgroups (black ver- 
sus white, female versus male, economically 
advantaged versus disadvantaged), while the ques- 
tion for productivity considered the average gain due 
to tracking compared to the simulated absence of track- 
ing. In the present study. I am concerned with in- 
equality in achievement between the tracks themselves, 
and with differences in the school achievement levels 
associated with different tracking systems. 



elusion holds even after taking into account the 
effects of unmeasured selection variables. 

Previous writers have disagreed about whether 
tracking's differentiating effects vary across 
schools. Citing case studies, Rosenbaum (1984) 
argued vehemently that such variation occurs, 
and he attributed discrepant survey findings in 
part to differences among tracking systems. In a 
nationwide study, Oakes (1985) described con- 
siderable variation in the characteristics of track- 
ing systems in 25 junior and senior high schools, 
but she did not examine whether these differ- 
ences affected the impact of tracking on achieve- 
ment. 

Heyns (1974) reported statistically significant 
interactions between track positions and dummy 
variables representing the high schools in her 
data set. However, because the interactions were 
relatively small, and because they were not re- 
lated to school size or location, she estimated an 
additive model of tracking's effects. Using the 
same approach, Hauser, Sewell, and Al win (1976) 
found no significant between-school differences 
in track effects in Milwaukee County, Wiscon- 
sin. Consequently they, too, estimated an addi- 
tive model. Both Heyns's data, which were lim- 
ited to urban schools outside the South, and 
Hauser, Sewell, and Alwin's Wisconsin data, may 
be less variable than a national sample. Because 
of disagreement over the existence and magni- 
tude of between-school differences in track ef- 
fects, I first test for homogeneity of effects across 
schools, and then explore the sources of differ- 
ences that appear. 

Are Some Tracking Systems More Productive 
Than Others? 

A more productive tracking system is one that 
results in higher average achievement than a less 
productive one. In light of tracking's effects on 
inequality, this means that, with given propor- 
tions of students in the different tracks, a more 
productive system must have a greater positive 
effect for high-track students, or a less negative 
impact for students in low tracks, or some com- 
bination. 

Clearly, schools differ in their average achieve- 
ment levels. Even after adjusting for differences 
in student body composition, some schools ap- 
pear more productive than others. Although be- 
tween-school variation typically constitutes less 
than 20 percent of the total variation in student 
achievement, that amount is statistically signifi- 
cant (Bryk and Driscoll 1988; Lee and Bryk 
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1989). We do not know whether this variation is 
affected by the structure of tracking in schools. 
Gamoran (1987) reported higher overall vocabu- 
lary achievement in schools with larger college 
tracks, but the relation did not hold for math, 
science, reading, writing, or civics achievement. 
Moreover, the effect on vocabulary achievement 
vanished when students' own track positions were 
taken into account. To date, researchers have not 
presented a conceptual account for the impact of 
the structure of tracking on achievement levels 
in different tracks or in the school as a whole. 
Such an account must be based on knowledge of 
how the effects of tracking come about. 

How Do Track Effects Occur? 

Prior researchers have suggested that tracking 
influences student achievement through mecha- 
nisms of social-psychological and academic dif- 
ferentiation. Observers in Britain and the United 
States have argued that secondary school stratifi- 
cation polarizes the student body into pro-school 
and anti-school factions (Hargreives 1967; Lacey 
1970; Metz 1978; Ball 1981; Schwartz 1981). 
College-bound students conform to the school's 
demands, while others resist. Polarization may 
be stimulated by the labeling of students accord- 
ing to track positions (Schwartz, 1981). Teach- 
ers and guidance counselors communicate dif- 
ferential expectations to students by encouraging 
those in college-bound programs mora than oth- 
ers (Har greaves 1967; Heyns 1974; Ball 1981). 
Peer groups may also encourage polarization — 
observers and survey researchers have found that 
students tend to form friendships with others in 
the same track (Hargreaves 1 967; Hauser, Sewell, 
andAlwin 1976; Ball 1981;Eckert 1989; Hallinan 
and Williams 1989). Social relations within 
friendship groups may promote differentiated at- 
titudes and behavior in school. Presumably as a 
result of these conditions, high-track students of- 
ten find greater meaning in school work, are more 
motivated, put forth greater effort, and hold higher 
expectations for themselves compared to low- 
track students. All these factors are said to lead to 
differences in achievement. 

No quantitative study has tested these claims 
by including student behavior, attitudes, and ex- 
pectations as intervening variables between track 
position and achievement. In fact, the evidence 
is inconclusive as to whether high school track- 
ing actually produces such social-psychological 
differentiation, or whether it simply reflects dif- 
ferences already in place. Many studies have re- 



ported differences among tracks in educational 
expectations, even after controlling for academic 
plans at the outset of tracking (Rehberg and 
Rosenthal 1978; Alexander and Cook 1982; 
Waitrowski, Hansell, Massey, and Wibon 1982; 
Vanfossen, Jones, and Spade 1987; Berends 
1 99 1 ). The evidence concerning student attitudes 
and behavior is more ambiguous. Waitrowski et 
al. (1982) found no track effects on self-esteem, 
attachment to school, or delinquent behavior. 
However, Berends's ( 199 1 ) results supported the 
polarization hypothesis for academic engagement 
and discipline problems, and Vanfossen, Jones, 
and Spade (198" reported significant track ef- 
fects on self-esteem and liking for school. Track- 
ing is clearly implicated in the differentiation of 
students' educational expectations, and possibly 
students' attitudes and behavior as well. Varia- 
tion in expectations, attitudes, and behavior may 
then contribute to variation in achievement, 

Besides social-psychological differentiation, 
tracking also appears to produce differences in 
students' academic experiences that further dif- 
ferentiate achievement. Students in college-pre- 
paratory programs take more academic courses, 
particularly in math and science (Gamoran 1987; 
Vanfossen, Jones, and Spade 1987). In many sub- 
ject areas, they are exposed to more high-status 
knowledge (Keddie 1971; Burgess 1983, 1984; 
Oakes 1985; Page 1987, 1991). Teachers in high- 
track classes present more complex material at a 
fasterpace (Metz 1978; Bail 1981; Oakes 1985), 
and survey and observational studies have re- 
ported a more positive academic climate in high- 
track classes (Metz 1978; Oakes 1985; Vanfossen, 
Jones, and Spade 1987). Finally, teachers reputed 
to be more skillful are disproportionately assigned 
to high-track classes (Hargreaves 1967; Lacey 
1970;Rosenbaum 1976; Ball 1981;Finley 1984). 
Although these between-track differences are 
clearly documented, their mediating role in the 
relation between tracking and achievement is less 
well established (Gamoran and Berends 1987; 
Gamoran, Nystrand, Berends, and LePore 1992). 
Nonetheless, instructional differentiation appears 
to be another important mechanism underlying 
track differences in achievement. 

STUDENT ACHIEVEMENT AND THE 
STRUCTURE OF STRATIFICATION 

Stfrensen ( 1970) described organizational differ- 
entiation in school systems, and his description 
was elaborated by Rosen baum (1976, 1984) and 
Oakes ( 1985). In Sorensen's scheme, high school 
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tracking is an instance f horizontal differentia- 
tion involving curricula differences within grade 
levels, and vertical differentiation involving sta- 
tus distinctions between academic and 
nonacadermc programs. Not al! tracking systems 
are alike, however. They vary along several struc- 
tural dimensions, including ( I ) selectivity — the 
degree of homogeneity within tracks; (2) electivity 
— whether students choose or are assigned to 
track positions; (3) inclusiveness — the subse- 
quent educational opportunities available; and (4) 
scope — the breadth and flexibility of track as- 
signment. How arc these structural characteris- 
tics related to variation in between-track inequal- 
ity and school productivity? 

Selectivity 

Sorcnsen defined selectivity as the amount of 
homogeneity created by grouping students ac- 
cording to characteristics relevant for learning. 
Classes in a highly selective system are more 
homogeneous than the student body as a whole. 
Selectivity can also be viewed as the size of the 
gap between groups — the top group in a highly 
selective system is much higher on the selection 
criterion (e.g., ability) than other groups 
(Gamoran 1984). Thus, selectivity involves both 
the variance (homogeneity) and the means (lev- 
els) of the groups. The extent to which a school' s 
tracks are homogeneous and distinct is a func- 
tion of two conditions: the initial heterogeneity 
of the student body, and the policies that distrib- 
ute students to tracks. 

By definition, highly selective tracking sys- 
tems arc clidst — they place high-achieving stu- 
dents together to form homogeneous classes. 
Tracking tends to be especially visible in highly 
selective systems, with high academic status 
awarded to the "cream of the crop/* By empha- 
sizing the top track at the expense of other tracks, 
selectivity probably magnifies between-track 
variation in students' educational attitudes and 
expectations. If so, one would expect high selec - 
tivity to accentuate between-track differences in 
achievement. 

Moreover, highly selective tracking systems 
arc often characterized by greater between-track 
variability in students' instructional experiences. 
Because teachers adjust instruction to student 
aptitudes, tracks that differ more in initial levels 
of student performance are likely to vary more in 
their instructional regimes (Dahloff 1971; 
Lundgren 1972; Barr and Dreeben 1983). and 
hence produce wider gaps in achievement. 



H, a : The greater the selectivity of a tracking sys- 
tem, the larger the differences between tracks 
in achievement, when relevant prior charac- 
teristics of students are controlled. 

At the same time, greater selectivity may lead 
to higher achievement overall. Many educators 
maintain that homogeneous classes allow teach- 
ers to tailor the curriculum to students 1 needs 
(Wilson and Schmits 1978). If there is an in- 
structional advantage to homogeneous grouping, 
that advantage is likely to be greater when the 
groups are more homogeneous (Slavin 1987). 

H lb : The greater the selectivity of a tracking sys- 
tem, the higher the overall achievement in 
the school, when the composition of the stu- 
dent body is controlled. 

Hypothesis lb is a prediction about average 
achievement in the school, and it docs not distin- 
guish among the tracks. Taken together, how- 
ever, Hypotheses la and lb imply '.hat selectiv- 
ity adds to inequality by raising achievement in 
high tracks more than in lower tracks. 2 

Electivity 

Electivity refers to the extent to which students 
choose or are assigned to tracks (Sorenscn 1 970). 
Several researchers have reported that even when 
students formally have a choice of tracks, in prac- 
tice they are highly intluenced by school authori- 
ties. Students and their parents are urged by teach- 
ers, principals, and guidance counselors to make 
the "right" choices according to their capacities 
(CicourelandKitsuse 1963; Bail 1981; Gamoran 
1992). 

Nonetheless, many American high school stu- 
dents believe they chose their track positions 
(Jencks et al. 1972; Jones. Vanfosscn. and Spade 
1986). 3 These perceptions may be a more impor- 
tant factor underlying track effects on achicve- 



: Sdrcnscn ( 1970) also noted that schools differ in 
the criteria used to assign students to programs. A key 
issue is the extent to which placement relics on cogni- 
tive characteristics, e.g.. ability or achievement This 
issue can be subsumed under selectivity, because when 
the selection process relics on achievement, more se- 
lective systems by definition involve tighter links be- 
tween cognitive characteristics and track positions. 

x In a random subsamplc of the nationally repre- 
sentative High School and Beyond survey, about two- 
thirds of high-school sophomores said they chose their 
curricular program (Jones. Vanfosscn. and Spade 
1986). 
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ment than are the objective circumstances of as- 
signment. Students who believe they selected their 
programs are more likely to be motivated to per- 
form, regardless of which track they are in. Thus, 
one may expect less social-psychological differ- 
entiation and, consequently, smaller between- 
track differences in achievement. Because the 
lower degree of differentiation occurs through 
more positive attitudes in all tracks, I also predict 
higher overall school achievement in a more elec- 
tive system. 

K 2i : The greater the degree of electi vity in a track- 
ing system, the smaller the differences be- 
tween tracks in achievement, when relevant 
prior characteristics of students are control led. 

H 2b : The greater the degree of electi vity in a track- 
ing system, the higher the average achieve- 
ment in the school, when the composition of 
the student body is taken into account. 

Sorensen (1970), by contrast, suggested that 
electi vity magnifies tracking's effects on achieve- 
ment. He reasoned that electivity leads to within- 
track homogeneity of educational aspirations, 
which in turn strengthens differential peer-group 
effects and thus increases the differences in 
achievement. 

Inclusiveness 

Inclusive tracking systems leave open students' 
options for future schooling (S0rensen 1970; see 
also Rosenbaum 1976 and Kilgore 1991). A high 
school tracking system is more inclusive if it as- 
signs relatively more students to the college-pre- 
paratory curriculum. The larger the size of the 
college-bound track, the more salient it is likely 
to be — for those who are left out. The stigma of 
being excluded is greater when a larger propor- 
tion of students are included (Page 1991). For 
example, membership in a ncncollege program 
may incur greater stigma when it consists of the 
bottom 10 percent of the school's academic hier- 
archy than when it is the bottom 40 percent. Al- 
though an inclusive system is less elitist, it is 
highly visible and thus stigmatizes those left out 
of the preferred group. 

However, a system characterized by very low 
inclusiveness also probably raises the salience of 
the college track. Like high selectivity, low in- 
clusiveness reflects an elitist system, which may 
increase the degree of social-psychological and 
instructional differentiation among tracks. Hence, 
1 expect larger differences between tracks in 
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achievement when inclusiveness is very low as 
well as when it is high. Smaller achievement dif- 
ferences may occur when students are more 
evenly distributed across tracks. 

H 3a : Controlling for relevant prior characteristics 
of students, track differences in achievement 
are larger when the system is highly inclu- 
sive or minimally inclusive, and smaller when 
inclusiveness is moderate. 

The impact of inclusiveness on overall school 
achievement may also be nonlinear. In general, 
schools with larger college tracks may have higher 
average achievement, net of composition and stu- 
dents' track positions, because a large college 
track reflects greater academic emphasis in the 
school, which tends to raise achievement for all 
students regardless of track (PowelL Farrar, and 
Cohen 1985; Lee and Bryk 1988). This effect, 
however, probably declines as inclusiveness be- 
comes very high because, as the academic track 
expands, students who are left out become in- 
creasingly stigmatized (Hypothesis 3a), depress- 
ing mean achievement. Thus, as the size of the 
academic track increases, the benefits of inclu- 
siveness may decline. 

H^ b ; Controlling for compositional differences, 
greater inclusiveness in a tracking system 
contributes to higher average achievement, 
but at a declining rate. 

This nonlinearity may account for the weak 
linear effects of size of academic track on stu- 
dent achievement observed in earlier work 
(Gamoran 1987). 

Scope 

Sorensen ( 1970) viewed scope as the extent to 
which students are located in the same track across 
subjects. Rosenbaum ( 1976) added track mobil- 
ity (movement of students across tracks) to the 
concept. 4 Oakes (1985) further characterized 
scope to include "extent" (the proportion of all 
classes that are tracked), "pervasiveness" (the 
number of subject areas that are tracked), and 

4 Sorensen (1970) distinguished scope from "the 
rigidity of differentiation ... the extent to which stu- 
dents may transfer to another group than the one origi- 
nally assigned to" (1970. p. 363, note 2). Although 
Sorensen believed this would involve few students, 
recent data suggest transfers are common, at least as 
indicated by self-report data (Gamoran 1987). Thus, 
the permanence of assignments, or track immobility, 
is considered part of tracking's scope. 
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"flexibility" (whether track assignments are made 
for each subject or across all subjects). 

Tracking systems with wide scope are likely 
to be more salient to students than systems with 
narrower scope. Status distinctions may be more 
meaningful if they apply to a large share of a 
student's school day and if they are consistent 
across subjects. In addition, tracking systems that 
cover more subjects and allow less mobility are 
more likely to produce differential friendship net- 
works (Sorensen 1970). The socialization effects 
of tracking are thus compounded in a system of 
wide scope, and therefore differences among 
tracks in achievement may be larger. 

Wider scope also means greater between-track 
variation in students' academic experiences. Stu- 
dents grouped for more subjects and for a longer 
time period are exposed to more differentiated 
instruction, thus increasing the net effects of track 
position on achievement. 

H 4a : The wider the scope of a tracking system, the 
larger the differences between tracks in 
achievement, when relevant prior character- 
istics of students are controlled. 

Also, a tracking system that is inflexible over 
time and across subjects may result in lower over- 
all achievement in the school, compared to a more 
flexible system. Failure to adjust assignments for 
developmental, motivational, or other changes 
in students' capacities for learning, and failure to 
recognize differences in students' aptitudes for 
different subjects, impede the match of instruc- 
tion to student needs (Slavin 1987). Hence, the 
differentiating effect of wide scope is likely to 
occur in a context of lo *ver overall achievement. 

H 4b : The wider the scope of the tracking system, 
the lower the average achievement in the 
school, when student body composition is 
controlled. 

Public Versus Catholic Schools 
Prior research has suggested that tracking in 
Catholic schools differs from tracking in public 
schools. First. Catholic schools place greater aca- 
demic demands on students in noncollege tracks, 
requiring more academic courses and more rig- 
orous classwork, compared to noncollege tracks 
in public schools (Hoffer. Greeley, and Coleman 
1985; Lee and Bryk 1988; Camarena 1990). 
Hence, the degree of instructional differentiation 
between tracks may be lower in Catholic schools. 
Second, an observational study of three Catholic 
high schools reported that students and teachers 



hold positive views about assignment tc low 
tracks and are optimistic about the possibility of 
remediation (Valli 1990). This finding contrasts 
with the negative attitudes typically found in pub- 
lic schools (e.g., Oakes 1985), and suggests that 
tracking's impact on social-psychological differ- 
entiation may be less in Catholic schools. For 
these reasons, net achievement gaps between 
tracks are likely to be smaller in Catholic schools 
than in public schools. 

H 5a : Differences between tracks in achievement 
are smaller in Catholic schools than in public 
schools, when relevant prior characteristics 
of students are controlled. 

Several studies reported higher average 
achievement in Catholic schools compared to 
public schools (Hoffer et al. 1985; Lee and Bryk 
1988. 1989; for critiques, see Alexander and 
Pallas 1985; Willms 1985; Jencks 1985). Part of 
the Catholic-school advantage may be tied to the 
way tracking is used — the relatively large size 
of the academic track, the emphasis on academic 
work in all tracks, and the less stigmatization of 
low-track students — al I may contribute to higher 
achievement in Catholic schools (Hoffer et al. 
1985; Lee and Bryk 1988; Valli 1990). Hence, I 
predict higher overall achievement in Catholic 
schools than in public schools. 

H Sh : Catholic schools have higher overall achieve- 
ment, net of compositional differences, com- 
pared to public schools. Differences in the 
structure of tracking account for part of the 
Catholic-school advantage. 

METHODS 

These hypotheses describe effects at two levels 
of analysis: ( 1 ) student-level effects on achieve- 
ment within schools, and (2) school-level effects 
on between-school differences in the impact of 
tracking, and on variation in school mean achieve- 
ment, net of compositional differences. To ad- 
dress both levels of analysis, I use a method called 
hierarchical linear modeling (HLM) (Bryk and 
Raudenbush 1 992), also known as multilevel con- 
textual analysis (Mason, Wong, and Entwistle 
1 983 ; DiPrete and Grusky 1 99 1 ). HLM estimates 
equations corresponding to the two levels of 
analysis. At the student level, achievement within 
each school is predicted: 

(Achievement),! - B ()/ - 4- ll^iTrack)^ 

+ f3 2 , (Background),; + e tf . ( 1 ) 
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In this study, both the intercept, 13,,,, and the track 
effect, 13,,, are allowed to vary from school to 
school, while the effects of background variables, 
B 2r are constrained to be equal across schools. 5 
The 13 coefficients that vary across schools <I3„, 
and 13,,) serve as dependent variables in the school- 
level equations: 

% = Y«» + ymtfector^+y^Structure), + u (I/ ; (2) 

B i; = Ym + Yn^tor^+YutJ/rwc/M/Te^ + U!,. (3) 

In equations 2 and 3, "sector" refers to Catholic 
or public schools, and "structure" stands for the 
structural characteristics of tracking systems/ 1 
When the within-school predictors are centered 
around their grand means, G (l/ represents school 
mean achievement adjusted for compositional 
differences, i.e., net school productivity. 7 Inequal- 
ity between tracks is reflected in 13,,, which is the 
net achievement gap between tracks in each 
school. In HLM, equations I, 2, and 3 are esti- 
mated simultaneously, producing maximum-like- 
lihood estimates of the variance components, 
which are then used to generate the 13 and ycoef- 
ficients (for a more detailed account, see Brvk 
and Raudenbush 1992). 

The HLM approach is superior to traditional 
techniques for measuring school effects and track 
effects. For example, one common strategy esti- 
mates the entire model at the student level, as- 
signing values of school-level variables to stu- 
dents within schools. This approach uses ordi- 
nary least squares (OLS) regression to obtain the 
track effects, and adds interaction terms to assess 
the impact of sectoral and structural variation on 
the effects of tracking: 

* In equation 1, B :< represents a number of back- 
ground variables: I have written the equation as if 
there were only one for the sake of simplicity. In 
principle, the effects of the background variables could 
also be allowed to vary between schools. However, 
freeing more slopes multiplies the number of vari- 
ances and covarianccs that arc estimated, dramati- 
cally increasing the complexity of the model and the 
difficulty of estimation. For this reason. HLM users 
arc advised to start small, freeing parameters only 
when there is theoretical interest in their variability 
(Bryk and Raudenbush 1992). 

" Again, for simplicity I have written the equations 
as if there were but one structural predictor, although 
several will be included in the analyses. 

7 To adjust the within-school intercepts for varia- 
tion in effects permitted to vary across schools, the 
within-school variable must be centered around each 
school's mean and the school's average for the vari- 
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(Achievement), = I3„ + \\(Track\ 

+ ^(Background), + ^(Sector), 

+ ^(Structure), + 6 s (Track x Sector), 

+ ft (} (Track x Structure), + e, . (4) 

The main advantage of HLM is its treatment 
of error variance: Whereas equation 4 contains 
only one error term, equations 1, 2, and 3 parti- 
tion error variance into within-school and be- 
tween-school components. OLS confounds the 
two sources of error, a problem that k particu- 
larly serious when one level of observations is 
clustered within a second, as when students are 
surveyed within schools. This violates the as- 
sumption of independent errors in the individual- 
level model (equation 4), leading to underesti- 
mated standard errors (Goldberger and Cain 
1982). By estimating separate school-level and 
student-level errors, HLM adjusts for the corre- 
lation of errors within schools (Bryk and 
Raudenbush 1992). 

Another benefit of using HLM is that it esti- 
mates the total school-level variance in J3 0; and 
I3 l/t before and after the multilevel interactions 
are included. I use this feature to indicate the 
degree to which sector and structural variables 
account for net between-school variation in 
achievement and track effects. I also use it to test 
for the heterogeneity of achievement means and 
distributions across schools. 

Of course, HLM does not resolve every statis- 
tical difficulty in estimating the effects of school- 
ing. One especially important issue for this study 
not specifically addressed by HLM is that stu- 
dents are assigned to tracks on the basis of antici- 
pated differences in the very outcomes in which 
we are interested. If this differential selection to 
tracks is not taken into account, then what appear 
to be track effects may simply reflect pre-exist- 
ing differences among students enrolled in the 
different tracks. A statistically related problem is 
that controls for prior conditions are not corn- 



able must be included in the equation for the intercept 
(equation 2) (Bryk and Raudenbush 1992). In this 
analysis, background variables have constant effects 
across schools, so they arc centered around their grand 
means, i.e.. they arc deviated from the means of the 
total sample. Because the effects of tracking arc al- 
lowed to vary across schools, the track variable is 
centered around school means. Later in the analysis, 
the proportion of students in the academic track is 
included as a predictor of adjusted • *hooI achieve- 
ment. 
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pletely reliable, which reduces their effective- All students with data on track position and 1982 
ness and possibly inflates the estimates of track achievement were included in the student-level 
effects. My approach to dealing with selection sample, for a total of 20,762, or 773 percent of 
bias and unreliability in this analysis is to include the students surveyed in the 883 schools. Miss- 
a rich set of contiols for observed prior condi- ing values on student-level independent variables 
tions in the student-level equation, Although no were imputed using regressions based on the van- 
set of controls can eliminate selection bias with ables with data present, 
certainty, previous research using these same data 

has indicated that a comparable array of control Achievement Outcomes 
variables eliminates the correlation between un- 
observed selection factors and subsequent Senior-year (1982) scores on multiple-choice tests 
achievement, suggesting that selection bias can of mathematics and verbal achievement serve as 
be effectively reduced in this case (Gamoran and two separate individual-level outcomes. Heyns 
Mare 1 989). Jencks ( 1985) also advocated inclu- and Hilton ( 1982) reported reliabilities of .85 (part 
sion of multiple prior test scores to compensate 1) and .54 (part II) for the 38-item, two-part math 
for unreliability in these data. test. I summed the two parts to create a single 
Analyses were conducted using the HLM com- measure of math achievement. Verbal achieve- 
puter program (Bryk. Raudenbush. Seltzer, and ment was calculated by adding scores on the 20- 
Congdon 1988). Kreft, Kim. and DeLeeuw (1990) item reading test and the 21 -item vocabulary test, 
provided a comparison of HLM with other pro- for which Heyns and Hilton (1982) reported 
grams for multilevel analysis. I begin by exam- reliabilities of .78 and .81, respectively. Table 1 
ining the extent to which schools vary in net provides sample means and standard deviations 
average achievement and in the net effect of mem- for all variables, 
bership in the acade:n ; j program on achievement. 
I then explore the sectoral and structural sources Tmfk p ositions 
of between-sehool variation in these parameters. 



For information on tracking, achievement, and (1980). Use of sophomore-year track reports 
school characteristics, the best data set available eliminates the problem of whether senior-year 
is High School and Beyond (HSB), a survey of a reports, which are also available in the data, are a 
national sample of high schools and students be- response to achievement rather than a cause. Stu- 
gun in 1980 (Jones et al. 1983). For the present dent reports do not always agree with school of- 
analyses. I use data from 964 public and Catholic ticials' reports of track locations, presumably 
schools in the 1980 (base year) and 1982 (first because many schools do not formally label their 
follow-up) surveys. Data were gathered from a tracks (Moore and Davenport 1988) and students- 
random sample ot up to 36 students in each are not always aware of their overall curricular 
school, for a total of 28.804 students. I deleted 11 programs (Rosenbaum 1980). However, prior 
schools that had 10 or fewer student respondents, research demonstrated that self-reports are rel- 
I also eliminated 30 schools that had no college- evant for a study of track effects on achievement, 
track students and 4 schools in which all students Self-reports are likely to capture the social-psy- 
reported belonging to the college track. By draw- chological aspects of tracking because track per- 
ing on information from other groups, HLM can ceptions are linked to expectations and peer as- 
estimate parameters for a variable that has no sociations (Gamoran and Berends 1987; Hallinan 
within-group variance (e.g., when all students are and Williams 1989). Self-reports may be less sen- 
in the same track). However, the study concerns sitive to instructional differences associated with 
the impact of differences in the structure ot track- tracking, but previous work showed they corre- 
ing, not the presence or absence of tracking, spond reasonably well to courses taken. A 1972 
Moreover, some of the structural variables were national survey found that students 1 and school 
undefined when all students reported the same officials' reports of track positions agreed in about 
track, resulting in school-level missing data. Ad- 80 percent of cases (Fennessey. Alexander, 
ditional missing data at the school level reduced Riordan. and Salganik 1981). Vanfossen, Jones, 
the sample to 883 schools (805 public and 78 and Spade's (1987) analysis of data from a de- 
Catholic). or91 .6 percent of the original sample, cade later indicated that 85 percent of students 



DATA 



Students' track positions are indicated by their 
self-reported membership in an academic or 
nonacademic program in their sophomore year 
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Tabic I. 



Means and Standard Deviations of Variables in 
the Analysis: High School and Beyond Survey, 
1980 and 1982 



Variable 


Mean 


S.D. 


Student'level variables 






Main acnicvuiiicni. ivoi 




O.U / / 


Vi»rhnl Jiphir»vt»mi»nl IQK^ 

TLIk'lll till lit. 1 111*111. 1 7W». 


22.772 


8.101 


Math achievement. 1980 


18.896 


7.121 


Verhnl j»rhif»vr»nii»nl I^XO 

TbIUUI till IH VH HUH, l/UU 


20.222 


7.299 


Qrinnrt* nrhii'ViMivnl IQftO 

OH U lid Uv. 1 lib V V*l llwlllt 1 7l)u 


1 1.095 


3.577 


WriliniT iirhii»vr»nw*nt IQRD 

YY 1 Hill fl Ul>l lib Vl»l IH lilt l/OU 


10.417 


3.783 


Sex ( 1 = female) 


.512 


.500 


Ethnicity ( 1 = black or Hispanic) 


.233 


423 


Socioeconomic status 


-.050 


.709 


Academic track 


336 


.472 


• iL fit ft ft 1 1 V t 1 V 14 r (U(/l t .1 






Catholic 


099 


299 


School mean socioeconomic status 


- 110 


.367 


St'Iflf'It\'tI\' 






Achievement pap (math) 


5 017 


3 925 


Achievement gap (verbal) 


5.129 


4.116 


Track heterogeneity (math) (log) 


1 536 


174 


Track heterogeneity (verbal) (lop) 


1 580 


147 


Elect\\it\ 






Proportion choosing own track 


d44 


185 


Inciusiveness 






Proportion in academic track 


307 


208 


(Proportion in academic track ) : 


137 


179 


Scope 






Track immobility 


437 


264 


Honors rigidity 


.443 


247 


Remedial rigidity 


528 


221 



Note: Means and standard deviations were computed 
using High School and Beyond design weights (Jones et al. 
1983) Unweighted observations are 20.762 students and 
883 schools. 



who reported the college track as sophomores 
took math and science courses that were possibly 
or definitely college-oriented. By contrast, 64 
percent of nonacademic -track students took math 
and science courses that were definitely not col- 
lege-oriented. Similarly, Gamoran ( 1987) showed 
that students who said they were in the college 
track took more academic courses and more ad- 
vanced academic courses, especially in math and 
science, compared with other students. 

Other Student-Level Variables 

The student-level equation (equation 1 ) describes 
the predictors of achievement within each school. 



In addition to track position, it includes three items 
drawn from student questionnaires: sex, minor- 
ity status (black or Hispanic), and socioeconomic 
status (a composite consisting of the mean of 
nonmissing standardized values for mother's and 
father's education, father's occupation, family 
income, and home artifacts). Equation I also in- 
cludes sophomore-year performance on the math 
and verbal tests, as well as on tests of science and 
writing achievement. These control variables are 
associated with tracking and with achievement, 
and they are included to purge the estimated track 
differences from differences in the types of stu- 
dents assigned to various tracks. Using a similar 
set of within-school predictors of track locations, 
Gamoran and Mare ( 1939) found that estimates 
of bias due to differential selection to tracks were 
reduced to nearly zero. 

School-Level Variables 

Catholic schools are indicated by a dummy vari- 
able coded 1 (versus 0 for public schools). I also 
calculated a measure of school mean socioeco- 
nomic status by averaging student socioeconomic 
status within schools. This variable is included 
as a control variable when estimating effects on 
school mean achievement (productivity), so that 
apparent effects of Catholic schools and struc- 
tural conditions do not simply reflect differences 
in the socioeconomic contexts of the schools. In 
equation 2, school mean socioeconomic status is 
a "contextual effect," i.e., an effect of school so- 
cioeconomic status on average achievement in 
the school over and above the effect of individual 
socioeconomic status on student achievement 
within the school (Heyns 1 986). The student-level 
control variables adjust mean achievement for 
"compositional" differences, and school mean 
socioeconomic status is included so that sectoral 
and structural effects are estimated apart from 
"contextual" effects, which may operate through 
mechanisms not addressed in this study. The re- 
maining school-level variables describe the struc- 
tural dimensions of tracking. 

Selectivity. 1 constructed two types of selectiv- 
ity indicators. One type reflects the initial achieve- 
ment gaps between tracks, computed for each 
school as the difference between the average test 
scores of college-track sophomores and those of 
noncollege-track sophomores. The second type, 
track hetereogeneity, is the pooled within-track 
variance in sophomore test scores for each school. 
Because of a high negative skew, 1 transformed 
the variances logarithmically. I computed two 
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indicators for each type, one for math achieve- 
ment and one for verbal achievement. Large 
achievement gaps between tracks and less within- 
track heterogeneity (i.e., smaller track variances) 
indicate more selective tracking systems. 

On average, the sophomore achievement gap 
between tracks was about 5 points in both 
subjects, but the average pooled within-traek 
variance (about 39) was almost as large as the 
typical total school variance (about 45). This sug- 
gests a possible weakness in the selectivity mea- 
sures; there may be further differentiation within 
tracks that these measures do not capture (Oakcs 
1985). 

Electivity. Flectivity is the proportion of stu- 
dents in the school who said they chose their 
curricular programs. This measure relies on stu- 
dents' perceptions of electivity, but as I noted 
earlier, students' perceptions of whether they 
chose their tracks arc probably more relevant for 
the tracking-achievement relation than the ob- 
jective circumstances through which assignment 
occurs. Table 1 shows that, on average, nearly 
two-thirds of students in a school believe they 
chose their tracks. 

Inclusivcncss. Inclusivcncss is indicated by the 
proportion of students in a school's academic 
track.* To allow for the anticipated nonlinear ef- 
fects of inclusivcncss, I included a quadratic term 
for this variable. Inequality between tracks is ex- 
pected to be greatest when inclusivcncss is very 
high or very low; this would be indicated by a 
negative linear effect and a positive quadratic 
term. School productivity is expected to rise with 
increasing inclusivcncss, but at a declining rate; 
this would be indicated by a positive linear effect 
and a negative quadratic term. 

Scope. I calculated three indicators of scope. 
The first, track immobility, is a measure of agree- 
ment between students' sophomore-year and se- 
nior-year track positions. This variable is a kappa 
statistic (Cohen I960) — it indicates the extent 



x Kilgorc's (1991) measure ot inclusivcncss was 
the proportion of students in a school in the academic 
track adjusted lor achievement. In her study, inclu- 
sivencss was a dependent variable. Here inclusivc- 
ncss is an independent variable, and its effects arc 
estimated on outcomes that arc adjusted lor prior 
achievement and background variables. Conceptually, 
the simple proportion is appropriate lor my purposes 
because students' notions ot where they arc in the 
academic hierarchy are likely to be influenced by the 
absolute size ot the academic track in their school, not 
by the size of their academic track relative to that in 
other schools with similar compositions (Page 1991). 



to which students tend to remain in the same 
track over time. 9 1 use the kappa statistic rather 
than simple proportions of students moving m 
and out of the college track because it is indepen- 
dent of differences in the marginal distributions 
of students across tracks. A kappa value of I 
indicates no mobility between the sophomore and 
senior years, whereas 0 indicates students were 
as likely to move as to stay. (Negative values for 
kappa are also possible, but they are unlikely in 
this situation because they would indicate a ten- 
dency for students to shift tracks more often than 
remaining). The kappa statistic for track immo- 
bility was computed separately for each school, 
Table 1 shows an average of .437, indicating a 
moderate amount of mobility, consistent with 
previous work (Gamoran 1987). 

The other two indicators are also kappa statis- 
tics: Honors rigidity is the extent to which stu- 
dents who reported taking honors math classes 
also take honors English classes; remedial rigid- 
ity is the extent to which students in remedial 
math also take remedial English. High values for 
track immobility, honors rigidity, and remedial 
rigidity indicate wide scope in tracking systems. 

RESULTS 

I estimated three HLM models for each of the 
two achievement outcomes. Model 1 is a baseline 
model, which produces estimates for the wiihin- 
school equation (equation 1), and for the vari- 
ance components of the parameters that differ 
among schools (i.e., B, v and B, ; from equations 1, 
2, and 3). 

Model 2 adds sector (Catholic versus public 
schools) as a predictor of between -school differ- 
ences in track effects and in mean achievement, 
adjusted for differences in composition. (Model 



' The formula for kappa is t f\ - 1\ ) I ( 1 - P % ) where 
is the proportion observed and P t is tne proportion 

expected by chance (Agrcsti 1990). For example, track 

immobility in a school is computed as: 

[( P.M *■ P..NA1 ) - ( /Mc80 x /VU H2 + Pi\'AtW x PNAvto)] 
1 1 - ( PAi 80 x l>Ac 82 + PNAt 80 x PNAt'Z2)\ 

where PAc is the proportion in the academic track in 
both years: P it NAc is the proportion in the nonacademic 
track in both years: PAcSO and PAc82 arc the propor- 
tions in the academic track in 1980 and 1982; and 
PNAcSO and PNAc82 arc the proportions in the 
nonacademic track in 1980 and 1982. (Multiplying 
and summing the marginals as indicated yields the 
cell proportions expected by chance.) 
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Table 2. Gamma Coefficients trom HLM Analyses of 1982 Math Achievement and 1982 Verbal Achievement: 
School and Beyond Survey, 1980 and 1982 





1982 Math Achievement 


1982 Verbal Achievement 


Predictor Variable Model I 


Model 2 


Model 3 


Model 1 


Model 2 


Model ^ 


Siimr-.NT-I jivrL Equation 














Math achievement, 1980 


.666*" 


.663"* 


.660"* 


.095*** 


.091"* 


.090*" 




(.007) 


(.007) 


(.007) 


(.007) 


(.007) 


(.00?) 
.584*** 


Verbal achievement. 1980 


.089*'* 


.085'*' 




.589*" 


.585*" 




(.007) 


(.007) 


(.007) 


(.007) 
.246*" 


(.007) 
.250*" 


(.007) 
.250*" 


Science achievement. 1980 


.153*" 


.155'" 


.158*** 




(.014) 
.159*" 


(.014) 


(.014) 


(.013) 
.244"* 


(.013) 


(.013) 


Writing achievement. 1980 


.156— 


.154"* 


.240*" 


.238"* 




(.013) 
-.804"' 


(.013) 


(.013) 


(.013) 


(.013) 


(.013) 
-.203" 


SexO = female) 


-.809*" 


-.81 7*** 


-.190" 


-.196" 




(.070) 
-64I ,M 


(.070) 


(.070) 


(.069) 


(.009) 


(.069) 


K tonicity ( 1 s» black or Hispanic) 


-58f * 


-.637"* 


- 926*** 


-.899*" 


-.908*" 




(.082) 


(.082) 


(.083) 


(.080) 


(.081) 


(.082) 


Socioeconomic status 


.742"" 


.600*" 


.582"* 


.708*" 


.573"* 


.569"* 




(.050) 


(.053) 


(.053) 


(.050) 


( 052) 


(.052) 


Academic track 


1.440*" 


1.592"* 


.047 


.940"* 


1.047*" 


.139 




(080) 


(.085) 


(.731) 


( 076) 


(.081) 


(.864) 


Intercept (adjusted school achievement) 20.133*"* 


20.116*" 


19.069"* 


22.748**" 


22.692*" 


21.815"" 


SrnooL- 1 j v u. Koum ions 


t .044 ) 


(046) 


(446) 


t.043) 


(.043) 


(.469) 












HI feet s on hct we en track tru'cjuaiitv 














Catholic 




- 665" 


- 172 




- 284 


102 






i 245) 


( 286) 




f .232) 


(.287) 


Selectivity 












Achievement pap 






-.026 
(.023) 


— 


— 


-.032 


Track heterogeneity 






.066 






019 


Hlcclivity 






(.404) 






(.479) 












Proportion choosing own track 






804 






.041 








(.490) 






( 4831 


1 Delusiveness 












Proportion in academic track 






-3 235 
i 1.737) 






180 
(1 \34) 


(Pioportion in academic track)* 






4.830* 
(1.974) 







-901 
1 1.970) 


Scope 














') rack immobility 






2811*" 

1.374) 






1.625*" 
(.381) 


Honors rigidity 






262 
< 315) 




— 


.387 
(.316) 


Remedial rigidity 






-.084 








.216 


t<ffi'ct\ on udiustvd school uchwvvmvnt tproducttvttvl 




(.330) 






( 330) 


Catholic 




839"' 


5^0** 


— 


1.1 19"* 


1.064"* 






( 145) 


( 177) 




( 137) 


(.167) 


School mean socioeconomic status 




.917"* 


.614"* 




.770*" 


.542*" 


Selectivity 




(.125) 


(.144) 




(.119) 


(.132) 














Achievement gap 




„ 


.007 
(.012) 


— 




-.013 
(.011) 


Track heterogeneity 






.358 






.262 








( 247) 






(.259) 


hlectivity 












Proportion choosing own track 






-380 
( 279) 






508 
(.262) 


Inclusive ncss 














Proportion in academic track 






3.784*" 
(.823) 






1.600* 
(.777) 


(Proportion in academic track)- 






-2.852" 






-.916 








(.916) 






(.865) 


Scope 












Track immobility 






-.520* 
(210) 






040 
(.203) 


Honors rigidity 






-.184 
(.180) 






-.311 
(.171) 


Remedial rigidity 






.355 
(.191) 






-.208 
(.181) 



><.05 '><.01 "'/x.OOl 



Note: Numbers in parentheses are standard error.; N = 883 schools and 20,762 students. 
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2 also includes school mean socioeconomic sta- 
tus as a predictor of school achievement.) In 
Model 3, structural dimensions of tracking are 
added to show how these organizational condi- 
tions affect school achievement and the gaps be- 
tween tracks in achievement- Table 2 displays 
the results of the three models for the each of the 
two achievement outcomes (1982 math scores 
and 1982 verbal scores). 

Baseline Models 

The baseline model (in Table 2 Model I ) presents 
estimates for the student-level equation. Each of 
the eight predictors exerts a significant impact on 
the math and verbal achievement scores. 111 Con- 
sistent with previous research, the average effect 
of academic-track membership is positive, reflect- 
ing increasing between-track inequality. 

The baseline model also provides estimates of 
residual variance for the two coefficients that were 
allowed to vary between schools: the academic 
track effect, and school mean achievement ad- 
justed for the composition of the student body. 
These parameters are displayed in the top panel 
(Model 1 ) of Table 3. The chi-squarc tests indi- 
cate significant variation between schools in ad- 
justed mean achievement in both subjects. The 
impact of tracking on math achievement also var- 
ies significantly between schools, but the degree 
of variation in tracking's effect on verbal achieve- 
ment is much smaller and is not statistically sig- 
nificant. Thus, additive models of track effects in 
previous research have been appropriate for ex- 
amining verbal achievement but incomplete for 
studying math achievement. 

The next step is to try to account for observed 
variation among schools in adjusted mean 
achievement and track effects. Although there is 
little variation to explain in the case of track ef- 
fects on verbal achievement, I assess the model 
for that parameter as well as the others for com- 
parative purposes. 

Sector Effects 

Model 2 in Table 2 addresses the question of 
whether track effects and overall achievement 

'"The student-level results agree with prior studies 
except for the negative c( feci of being female on ver- 
bal achievement. Previous analyses of these data found 
no significant sex differences in reading and vocabu- 
lary scores (Gamoran 1987). The negative sex ctfect 
reflects the inclusion of prior writing achievement, an 
area in which females have a substantial advantage. 



rwicEo/ 

mJesearci 



vary between Catholic schools and public schools. 
In the analysis of between-track inequality, the 
Catholic-school coefficients are negative, indicat- 
ing smaller achievement gaps between tracks in 
Catholic schools, but the effect is statistically sig- 
nificant only for math achievement. In that sub- 
ject, academic-track students in public schools 
differ from their nonacademic counterparts by 
1 .592 points, net of background and prior achieve- 
ment. For students in Catholic schools, the dif- 
ference between tracks is only (1.592 - .665) = 
.927 points, or about 42 percent smaller. 

In the analysis of adjusted school achievement, 
the results show the familiar Catholic-school ad- 
vantage on both math and verbal tests, even after 
allowing for the positive contextual effect of mean 
socioeconomic status. Especially in math, then, 
Catholic schools have less inequality between 
tracks in a context of higher overall achievement, 
supporting Hypotheses 5a and 5b. The differ- 
ences between Catholic schools and public 
schools may result in part from differences in 
thestructure of tracking in the two sectors. 

Effects of the Organization of Tracking 

Model 3 i n Table 2 shows the impact of the struc- 
tural characteristics of tracking on the achieve- 
ment gaps between tracks and on adjusted school 
achievement. Track immobility, a measure of 
scope indicating whether students tend to remain 
in the same tracks over time, leads to greater in- 
equality between tracks in both math and verbal 
achievement. Other things being equal, the gap 
between tracks in a very rigid tracking system 
(defined as one standard deviation above themean, 
or a kappa statistic of .70 1 ) is wider than the gap 
between tracks in a very flexible system (onestan- 
dard deviation below the mean, or kappa = . 173) 
by almost 1.5 points in math and more than 0.8 
points on the verbal test. In math, track immobil- 
ity also reduces achievement overall, butthis find- 
ing is not replicated for verbal achievement. Thus, 
Hypothesis 4a is supported for both subjects and 
Hypothesis 4b is supported only for math. 

Between-track inequality in math achievement 
is greater when inclusiveness is high or low, and 
smaller when inclusiveness is moderate. This is 
indicated by the negative linear coefficient (-3.235) 
and positive quadratic coefficient (4.830) for the 
proportion of students in the academic track. This 
finding supports Hypothesis 3a. What do these 
coefficients mean in substantive terms? Evaluat- 
ing the effects of inclusiveness at the sample av- 
erages for all other school-level variables yields 



J ~ Volume 1, No. /, Summer 1993 




School Practices 



AMERICAN SOCIOLOGICAL REVIEW 



Table 3 Chi-Square Tests for Homogeneity of Parameter 
Variance in the HLM Models: High School and 
Beyond Survey 1980 and 1982 



Model and 
Parameter 



Residual 
Variance I) F 



Percent of" 
Variance 
X : Explained 



MODEL 1 (Baseline Effects) 
Math 

Track effect 551 

Adjusted school .881 
achievement 



872 
872 



Verbal 

Track effect 

Adjusted school 
achievement 



.243 
757 



MODEL 2 (SrrroR En-ms) 
Math 

Track effect 



Adjusted school 
achievement 

\ 'e rbai 

Track effect 



591 
7! ! 



263 



Adjusted school 561 
achievement 



872 
872 



K7I 
870 



87! 
870 



MODEL 3 (Or<mni/atiov\l Ef-rrcTS) 
Math 

Track eftect 



107 

Adjusted school W)4 
achievement 



Verbal 

Track effect 



207 



Adjusted school .532 
achievement 



863 
862 



863 
862 



973.4* 
1819.4* 



903.3 
1700 9* 



469 <r 
1631.8' 



902.7 
1479.5* 



880.6 
1562 7*' 



877.0 
1436 3" 



0.0 
19.3 



0 0 
2*v9 



80.6 
24 7 



15.1 

29.8 



><.U5 '><.()! 



><.U01 



the following results: When inclusiveness is at 
the sample mean (.307), the gap between the col- 
lege track and the noncoliege track in math 
achievement is 1.26 points; when inclusiveness 
is low (.10), the between-track gap increases to 
1 .52 points; but when inclusiveness is very high 
(.75), between-track inequality also increases to 
as much as 2.09 points on the math test. This 
pattern, however, does not hold for verbal 
achievement. 

Inclusiveness of a tracking system also affects 
school mean achievement, not only in math but 
on the verbal test as well. The positive linear 
effect and negative quadratic coefficient are con- 
sistent with a positive impact at a declining rate 



as specified by Hypothesis 3b. Thus, Hypothesis 
3b is supported for both subjects and Hypothesis 
3a is supported for math. 

In contrast to the hypotheses for sector, scope, 
and inclusiveness, which are generally supported, 
I found no support for hypotheses about the 
electivity and selectivity of tracking systems. Most 
coefficients for electivity and selectivity are small, 
and none are statistically significant. Hence, the 
data are not consistent with Hypotheses la, lb, 
2a, or 2b. 

For math achievement, the Catholic-school ef- 
fects on the impact of tracking and on adjusted 
school achievement decline from Model 2 to 
Model 3 (as does the effect of school socioeco- 
nomic status on mean achievement). The sector 
difference in inequality drops by 44 percent 
(-.665 to -.372) and is no longer statistically 
significant, and the Catholic-school advantage in 
mean achievement drops by 37 percent (.839 to 
.530). This pattern is consistent with the argu- 
ment that lower inequality and higher productiv- 
ity of Catholic schools result in pan from differ- 
ences in the structure of tracking. The pattern, 
however, is weakly replicated in the verbal 
analysis. 

Given the much greater variation in between- 
track inequality for math achievement as com- 
pared to verbal achievement, it is not surprising 
that I had greater success explaining between- 
school variation in the track effect on math 
achievement. In Table 3, the bottom panel (Model 
3) shows the decline in residual variance after 
the school-level predictors are added. About 80 
percent of between-school variation in track ef- 
fects on math achievement is explained, com- 
pared to only around 15 percent of the variance 
in track effects on verbal achievement. With sec- 
tor and structural conditions taken into account, 
remaining variation in the track effect on math 
achievement is no longer statistically significant. 
For adjusted school achievement, the final model 
accounts for about 25 percent of the variance in 
math scores and nearly 30 percent of the vari- 
ance in mean verbal achievement, and signifi- 
cant residual variation remains. 

DISCUSSION AND CONCLUSIONS 

In this study, the question of how tracking affects 
achievement elicits a more complex answer than 
it has in the past. In general, the analyses indicate 
that the effects of tracking depend in part on the 
structure of the tracking system. This claim is best 
supported for math achievement, but it also holds 
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to some extern for verbal achievement. Schools 
with less mobility in their tracking systems tend 
to have greater between-track inequality in both 
subjects, and they have lower overall math scores. 
Tracking systems that are high or low in inclu- 
siveness also exhibit wider gaps between tracks 
in math achievement. Average math and verbal 
scores are higher in more inclusive systems, al- 
though the gains from inclusiveness accrue at a 
declining rate. Finally, Catholic schools not only 
have higheroverall achievement, net of measured 
background variables, but for math they also have 
less inequality between tracks, supporting previ- 
ous speculation (Gamoran and Berends 1987). 

Why are the patterns generally sharper for math 
achievement? Although track effects on inequal- 
ity are evident for both verbal and math achieve- 
ment, the variability of track effects is insignifi- 
cant for the verbal test. I posited two mecha- 
nisms for track effects, reflecting social-psycho- 
logical and instructional differentiation. Between- 
school differences in social-psychological mecha- 
nisms should be no less salient for verbal achieve- 
ment than for math achievement. However, in- 
structional differentiation may be more variable 
among schools in math than in English. Hence, 
the more variable effects of tracking on math 
achievement may reflect greater between-school 
differences in the organization of math instruc- 
tion. Still, track immobility contributed to in- 
creased between-track inequality in verbal as well 
as math achievement. This finding is consistent 
with the view that peer-group effects, which may 
be accentuated by permanent track assignments, 
are linked to inequality between tracks in both 
subjects. 

Although the residua! variance was initially 
greater for track effects on math achievement, 
the variance that remained unexplained was 
greater for verbal achievement. Other aspects of 
tracking systems, unexamined in this study, may 
explain varied track effects on verbal achieve- 
ment. For example, schools may vary in the cul- 
ture or ethos of tracking — tracking may be a 
clear symbol of students' future directions in some 
schools, while its significance in other schools is 
more vague (Gamoran and Berends 1987). Pre- 
sumably, tracking's impact would be magnified 
where its power to confer status is greater. Sym- 
bolic differences among schools, which were not 
addressed in the data, may be linked to variation 
in track effects on verbal achievement (Lamont 
and Lareau 1988). 

Some structural characteristics of tracking sys- 
tems did not exhibit the predicted effects. The 



inconsistent and insignificanr effects of electivity 
may indicate that two processes cancel out: 
Greater electivity may lead to increased motiva- 
tion, as I argued, but may also promote between- 
track differences in aspirations as specified by 
Stfrensen (1970). This issue could be addressed 
by examining peer-group formation in elective 
and nonelective tracking systems. Such analysis 
would show whether elective systems promote 
more homogeneous friendship groups that in turn 
lead to more powerful peer-group effects, as 
Sorensen predicted. The study of peer groups in 
different types of tracking systems may also re- 
veal whether more permanent track assignments 
encourage within-track friendship formation, as 
Sorensen also argued, a prediction that is consis- 
tent with my results for track immobility. 

The absence of effects for selectivity may re- 
flect a weakness of the measures, particularly the 
indicator of track heterogeneity: Hie conceptual- 
model refers to heterogeneity of students' classes, 
but the data address only the heterogeneity of 
tracks. If tracks are more finely differentiated 
than my measures reveal — Oakes (1985), for 
example, described ability-grouping within tracks 
as common — then the analysis may have missed 
the actual impact of reducing heterogeneity for 
track effects and for average achievement. 

Overall, my findings underscore the importance 
of assessing contextual variation in microsocial 
processes. Although the results are consistent with 
prior research for the general case — on average, 
belonging to the academic track is beneficial for 
achievement — the advantage is not the same in 
all schools, at least not for math achievement. The 
academic-track advantage is less in schools with 
more flexible and (for math) moderately inclu- 
sive tracking systems. At the same time, my re- 
sults should not encourage a haphazard search for 
contextual variation. An a priori conceptual frame- 
work should suggest what dimensions of the con- 
text need to be examined. In this study, the frame- 
work for understanding the aggregate-level dif- 
ferences was built on knowledge of how the 
microlevel processes occur. 

The results also draw attention to the value of 
examining tracking's variable effects on produc- 
tivity as well as inequality. The finding of less 
between-track inequality in math scores in Catho- 
lic schools, for example, does not by itself indi- 
cate that Catholic schools used tracking more 
successfully. The results could have occurred 
through lower scores in the academic track. By 
studying productivity as well, I confirmed that 
lower inequality occurred along with higher over- 
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all achievement, suggesting that the narrower gap 
in Catholic schools occurs because low-track stu- 
dents are brought up, not because high-track stu- 
dents are held down (Hoffer et al. 1985). Con- 
versely, the results for track immobility suggest 
that inflexible tracking systems have greater in- 
equality along with lower average achievement, 
presumably reflecting especially poor conditions 
in the nonacademic tracks of such schools. The 
implications of the findings for inclusiveness are 
even more complex: In math, between-track in- 
equality is lower when inclusiveness is moder- 
ate, but productivity is higher when inclusive- 
ness is high. Consequently, an educator must 
choose between maximizing overall achievement 
in the school — usually a significant goal — and 
minimizing inequality between tracks within a 
school. 

Quantitative analysis makes the world appear 
simpler than it really is. Are the benefits of added 
complexity worth the difficulties of concept- 
ualization, estimation, and interpretation? In this 
case, I think the enhanced theoretical understand- 
ing and the potential policy benefits justify the 
effort, particularly with regard to math achieve- 
ment. 

Adam Gamoran is Professor of Sociology and Educa- 
tional Policy Studies at the University of Wisconsin. 
Madison. His main research interest has been the ef- 
fects of stratification in school systems, especially the 
relation between tracking and achievement, and the 
role of classroom instruction as a mechanism under- 
lying differences between tracks in achievement. He 
is spending the 1992-1993 academic year as a 
Fulbright Scholar at the University of Edinburgh, 
where he is studying the impact of curriculum stan- 
dardization on changes in levels and inequality of at- 
tainment in Scottish secondary education. 
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This review discusses the effects of community involvement on students who face 
multiple impediments to success in schools. The first part of the article conceptualizes 
community involvement as a typology of four processes of social u hange: conversion, 
mobilization, allocation of resources, and instruction. Illustration, of these processes 
are drawn from research and programmatic literature. The second part of the article 
considers the effects of the varied forms of involvement in a review of 13 evaluations of 
interventions implemented with significant input from community entities. Overall, the 
studies indicate that programs can have positive effects on school-related behavior and 
achievement as well as at'' ides and risk-taking behavior. The concluding section 
identifies gaps in the research and offers a framework for future studies. 

Communities have always played important roles in students' intellectual and 
psychosocial development, but in the last decade educators, youth advocates, and 
policymakers have called for increased community participation to solve the prob- 
lems of educationally disadvantaged students. Numerous projects are underway, 
their existence heralded in the popular and professional literature. 

However, optimism and involvement have not been matched by systematic efforts 
to understand these initiatives in the context of evidence about the community's 
impact on students. This gap in knowledge can be attributed to the isolation of 
disciplines, the focus on specific projects rather than general components, and the 
ambiguity of concepts about community. This article introduces a definition of 
community involvement, provides a needed synthesis of findings from evaluations of 
community involvement projects, and offers a conceptual framework for future 
research. 

In this article, the term educationally disadvantaged is applied to students who face 
multiple impediments to success in school. Poor African-American and poor His- 
panic students comprise the bulk of those considered to be at-risk of negative 
educational outcomes, such as illiteracy and school dropout. 

Natriello, McDill, and Pallas (1990) estimate that in 1988, 25 million of the nation's 
63.6 million children under the age of 18 were educationally disadvantaged when any 
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one of five risk factors (race/ethnicity, poverty, family structure, language back- 
ground, and mother's education) were used. They project that the number of 
educationally disadvantaged children will increase substantially by the year 2,020, 
when the number of impoverished children will be 16.5 million — a 33% increase over 
the 12.4 million children in poverty in 1987. 



Community involvement consists of the actions that organizations and individuals 
(e.g., parents, businesses, universities, social service agencies, and the media) take 
to promote student development. Such community involvement is typically de- 
scribed in terms of specific roles that community actors play in supporting students 
(see Carnegie Council on Adolescent Development, 1989; Children's Defense Fund, 
1986; Constable & Walberg, 1988; Oakes, 1987; W.T. Grant Foundation, 1988). 
Community refers both to locales, such as neighborhoods, and to social interactions 
(e.g. , relations among a network of social service providers), that can occur within or 
transcend local boundaries. 

Nettles (1989) conceptualized these varied forms of involvement as a typology of 
four change processes: conversion, mobilization, allocation of resources, and instruc- 
tion. The first, conversion, refers to the process of bringing the student from one 
belief, or behavioral stance, to another. The second process, mobilization, includes 
actions to increase citizen and organizational participation in the educational proc- 
ess. Allocation refers to activities wherein community entities provide resources 
(such as social support and services) to children and youth. Finally, instruction 
embraces actions designed to assist students in their intellectual development or in 
learning the rules and values that govern social relationships in the community. 

In Nettles's formulation, the four processes embrace natural, or unstructured, 
occurrences of involvement as well as structured actions that constitute projects and 
formal interventions. Moreover, in interventions, one process may predominate 
(e.g., as instruction does in tutoring programs) or a combination of two, or more, 
processes may be evident. Adopting this typology as a framework for the following 
review of the literature provides a perspective on mobilization, allocation, and 
instruction. Unfortunately, although the literature is sprinkled with anecdotes about 
students who suddenly began to achieve or who suddenly ceased to behave in 
destructive ways as the result of exposure to a powerful message or charismatic 
person, systematic research on this kind of phenomenon with disadvantaged students 
is rare. Also absent from the literature are examples of natural occurrences of 
resource allocation and mobilization. Thus, I will discuss in the following section 
structured forms of allocation and mobilization as well as formal and informal 
examples of instruction. 



Mobilization embraces actions that fall under such labels as citizen participation, 
neighborhood organizing, partnerships for school reform and improvement, legal 
action, and social movements. The targets of such involvement are institutions, 
political jurisdictions, and geographic areas; therefore, effects on students are likely 
to be indirect. For example, citizen and parental participation on school governing 
boards may produce changes in the curriculum or in teacher attitudes towards 
students. These changes, in turn, may affect student achievement levels. 



Denning Community Involvement 
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The predominant focus of literature mobilization is the improvement of prac- 
tices that promote change. There are general guides for community action (Alinsky, 
1971; Rothman, Erlich, & Teresa, 1976, 1981) and handbooks that suggest highly 
specific actions to link schools and students with other entities in the community (see 
for examples, Asche, 1989; Bain & Herman, 1989; Merenda, 1986; Otterbourg, 
1986). These guides commonly include principles of practice based on studies of 
specific cases. One particularly active area of research is citizen participation in 
school decision making (sec Williams, 1989, for a review and synthesis). Recent 
attention has focused on partnerships between schools and community entities such 
as businesses, social service agencies, universities, cultural institutions, and commu- 
nity-based organizations. 

Partne?ships. In 1975, Jesse Jackson initiated and led a national crusade to involve 
parents, businesses, students, school staff, and other segments of local communities 
in the pursuit of excellence in education. The crusade eventually led to the PUSH for 
Excellence (PUSH-EXCEL) Project, a three-year, federally funded demonstration 
project that established a highly visible network of educational partnerships. The 
evaluation of the PUSH-EXCEL Project (S.R. Murray, Murray, Gragg, Kumi, & 
Parham, 1982) documented the extensive grass-roots organizing that preceded the 
demonstration projects in Chicago, Kansas City, Chattanooga, and Denver. The 
evaluation also documented the result of the PUSH-EXCEL Project's efforts to 
develop both a stable, active base of citizen support and a me\iu of school and 
community-based activities to produce improvements in student attendance, aca- 
demic motivation, sense of responsibility, grades, and test scores. The PUSH- 
EXCEL Project encountered many difficulties (e.g., defining roles of the various 
partners and establishing mechanisms for the sustained involvement of partners) in 
its efforts to transform the vision of its founder into concrete applications. 

Case studies of other partnerships (e.g., see Levine & Trachtman, 1988; Pine & 
Keane, 1989) indicate that the problems the PUSH-EXCEL Project experienced are 
common in school/community alliances. These difficulties can undermine collabora- 
tion unless the implementation process includes mechanisms to foster the relation- 
ship between partners. In a review of urban school/community alliances, Ascher 
(1988) cited as critical features in sustaining partnerships: 

commitment, egalitarian decision-making, a sense of ownership by participants at all 
levels, clarity about roles, clarity and flexibility about both methods and goals, an 
ability to bridge different institutional cultures, training, and patience concerning the 
collaborative process itself, (p. 14) 

She concluded that the principles of forming and maintaining successful collabora- 
tives are similar across types of partnerships. 

Mann (1987a, 1987b) examined business/school partnerships, a popular type of 
collaboration in the 1980s, in 23 large cities and in a stratified random sample of 85 
U.S. public school districts. Data were collected through telephone interviews with 
superintendents and other officials and through review of documents. Mann found 
that formal partnerships were concentrated in big cities and were useful in connecting 
urban schools and their predominantly low-income and minority populations to the 
business community. However, partnerships competed with other interests (such as 
local youth organizations) for funds. 

Current national mobilizations include the Black Church Project, that is sponsored 
by the American Association for the Advancement of Science. This project, through 
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a network of 15 sites, trains church staff and volunteers to conduct science and math 
workshops, public science days, and science and mathematics career days (George, 
Richardson, Lakes-Matyas, & Blake, 1989). Another major effort is One to One, 
which has this national goal: "By 1995, every young person who might benefit from a 
mentoring relationship will have the opportunity to be matched with a caring part- 
ner" (One to One, 1990, p. 2). This match is to be accomplished through the 
formation of local Leadership Councils, pilot neighborhood projects, and the Na- 
tional Mentoring Partnership. Finally, the ASPIR A Association, a community-based 
organization devoted to improving the status of Hispanic children and youth, has led 
a number of national efforts, including the organizing of clubs that provide Latino 
students with opportunities to receive academic and career counseling and to learn 
leadership skills. Two other national efforts are: the Public Policy Leadership pro- 
gram and the Hispanic Community Mobilization for Dropout Prevention (ASPIRA 
Association, 1990). 

Involvement as Allocation of Resources 

Community involvement often entails the allocation of resources to eliminate 
disadvantages in students' access to resources. For example, court battles to end 
school segregation were among the first of many community actions to reallocate 
educational resources. Other actions serve to remove barriers to access, alter the 
incentive structure, and provide social support for student efforts, 

Access to resources. To remove barriers to student use of health and social services, 
states and cities are placing clinics and other resources in or near public schools. As of 
Spring 1989, according to a survey conducted by the Center for Population Options 
(cited in Kirby, Waszak, & Ziegler, 1989), 90 providers were operating 150 health 
clinics in 32 states and 91 communities. The majority (59% ) of students who used the 
clinics were African American. Twenty-five percent were White, and 12% were 
Hispanic. The clinics provided a variety of medical, counseling, educational, repro- 
ductive, and family planning services and were typically found in low-income areas. 
Program operators included community health clinics, nonprofit organizations, hos- 
pitals, medical schools, departments of public health, and school systems. 

Other involvement efforts focus on increasing low-income youths' access to em- 
ployment and training. An example is the Youth Incentive Entitlement demonstra- 
tion, that the Manpower Demonstration Research Corporation managed and evalu- 
ated from 1978 to 1980. Through the efforts of the Youth Incentive Entitlement 
demonstration, over 80,000 low-income youths in 17 cities applied for work in jobs 
paying the minimum wage; in some cities, the employment rates of minority and 
White youth were equalized. Private businesses accounted for slightly over half of the 
work sponsors (Walker, 1984). 

Incentives for effort and achievement. It is often assumed that the incentive struc- 
ture for impoverished youth should be altered and that community actors can play a 
major role in creating and providing incentives, thereby encouraging students to 
invest in constructive pursuits. Two of the most widely cited examples of community 
involvement have attempted to provide incentives that will encourage disadvantaged 
students to graduate from high school and then either attend college or enter the 
work force. 

The first of these is The Boston Compact, that was initiated in 1982 with a formal 
agreement to the effect that businesses, labor unions, and the Boston city govern- 




Of 



FFICEo/ 2 1 l i 

RESEARCH IV ■ 4 Volume 1, No. 1, Summer 1993 




Community Involvement 



nient would provide priority hiring of graduates of the public high schools. In return, 
the school system contracted to reduce dropout rates, increase attendance, and 
assure that graduates were competent in basic skills. 

The second example is the "I Have A Dream" Foundation, that Eugene Lang 
established in 1981 with a spontaneous pledge to the graduating sixth-grade class of 
East Harlem's Public School 121. Lang promised to pay the college costs of each 
student who finished high school. He subsequently provided the students with 
various supports to facilitate their efforts to complete school. As of October 1989, the 
program had been replicated in 32 cities of 23 states, with funds provided by 132 
sponsors of 9,000 students (J.M, Sesnick, personal communication, December 6, 
1989). 

Both of these programs guaranteed valuable long-term incentives for staying in 
school. In Boston, the incentives were not of sufficient power to reduce the school 
dropout rates (Hargroves, 1986). Data on the original class of Dreamers from Public 
School 121 suggested a more positive outcome than that achieved through The 
Boston Compact. Ninety percent of the students either obtained, or were expecting 
to obtain, high school or general equivalency diplomas. The expected rate was 75% 
0i Have a Dream" Foundation, 1989). However, it is not clear whether the results 
for the Dreamers were due to the incentives or due to the assistance that the students 
received from program staff. The next section briefly considers the functions of such 
support in the lives of disadvantaged children and youth. 

Social support. Informal helpers play important roles in many communities. In 
poor neighborhoods, interpersonal resources may serve as substitutes for, or exten- 
sions of. institutional services and supports (McAdoo 1980; Stack, 1974). Despite the 
apparent strength of naturally occurring support, evidence from a variety of sources 
suggests that disadvantaged students either have limited access to resourceful adult 
helpers, rely heavily on peers, have parents and other family members who lack 
social support, or arc impeded by the demands of members of their social networks. 

In programs for disadvantaged students, social support can occur informally, in the 
context of relationships that are structured chiefly to provide academic, psychologi- 
cal, social, or other services. In the evaluation of six school-based clinics. 43% to 63% 
of the students who used the clinics cited the staffs caring as one of the five most 
important reasons for using the clinics (Kirby et al., 1989), When support only occurs 
as a byproduct of another component, it can be unpredictable, episodic, and un- 
targeted. Planned support that provides for sustained, goal-directed relationships is 
therefore offered in interventions, although the quality of relationships varies, Freed- 
man ( 1988) examined relationships between elder mentors and at-risk youth in five 
programs and found three types: primary relationships, in which a high degree of 
trust, enjoyment, and attachment were apparent; secondary relationships, which 
exhibited the characteristics of primary relationships, but in a less developed form; 
and nonsignificant relationships that were marked by distance (see Flaxman. Ascher. 
& Harrington. 1988, for further discussion of mentoring and other adult/youth 
relationships). 



The final form of involvement is instruction, which refers to actions that support 
intellectual development and social learning. Instruction in the community can occur 
informally as indicated in studies of language socialization and studies of the role of 
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parent as teacher. According to these studies, social interactions in the home and the 
wider community are important contexts in which children learn emergent literacy 
skills, self-regulation of cognitive and other tasks, and other skills and behaviors 
needed for performance in schools and performance in daily communication (Gun- 
dlach, Farr, & Cook-Gumperz, 1989; Heath, 1989; Scott-Jones, 1984, 1989). 

Instruction in the community can also take place in organized settings, such as 
tutoring programs, clubs, and teams. According to a recent survey (U.S. Depart- 
ment of Education, 1990), tutoring was the focal component in the estimated 1,701 
college-sponsored programs wherein college students tutored or mentored disadvan- 
taged elementary and secondary students. Churches also play a major instructional 
role through their programs of religious and moral education as well as their struc- 
tured tutoring programs. For example, in 1986, the Congress of National Black 
Churches began Project SPIRIT, a pilot program in five African-American churches 
in Atlanta, Indianapolis, and Oakland. The project was funded by a grant from the 
Carnegie Foundation and featured academic tutoring, instruction in life skills, and 
morale building (Carnegie Corporation of New York, 1987-1983). 

Other organized community efforts to foster academic performance provide addi- 
tional resources and support for learning and stimulating students' desire to achieve 
in school and in other settings. The National Council of la Raza is implementing the 
Innovative Education Project through Hispanic co mm unity- based organizations in 
Kansas City, Chicago, and Houston (Orum, 1988). The Project is designed to 
address the academic and nonacademic needs of low-income Hispanic students and 
their families. The Project on Adolescent Literacy conducted a national search for 
successful literacy programs for young adolescents (Davidson & Koppenhaver, 1988) 
and discovered both in-school and summer programs. In addition, it documented 
programs that community-based organizations were implementing for children dis- 
advantaged by poverty or by limited proficiency in English. 

The role of parents and family in both informal and structured instruction has 
received considerable attention (for reviews, sec Epstein, 1987; Epstein & Scott- 
Jones, 1988; Henderson, 1987; Scott-Jones, 1984; Slaughter & Epps, 1987; Tangri & 
Moles, 1987; Weisbaum, 1990), and evidence suggests that parental participation in 
children's efforts to learn in schools as well as in the broader community can have 
positive effects on students' school achievement. In the context of a general review of 
the effectiveness of community involvement in improving outcomes for disadvan- 
taged children and youth, the next section includes an illustrative study of parental 
involvement in a tutoring program. 



This review discusses the effects of the varied forms of involvement defined above. 
The studies that are examined were drawn from the empirical literature on interven- 
tions that are characterized by significant input from community entities. 



The studies reviewed below were located through a search of the ERIC and 
PsychLit databases and through a manual search of current newsletters (e.g., Educa- 
tion Week) and journals. All studies identified in the search were examined, if they 
met the following criteria: (a) if they addressed academic and other effects of 
programs, or projects, that were developed, or administered, by entities either 
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outside formal educational systems or staffed primarily by community residents or 
employees of public or private service agencies and (b) if a substantial proportion of 
the participants in the program came from low-income families or had other charac- 
teristics that were associated with educational failure. Studies were selected for 
review if their designs included some form of comparative analysis and if their reports 
(published and unpublished) were sufficiently detailed to permit evaluation of sam- 
ple composition, measures used, level of program implementation, and quality and 
type of data analysis. 

Table 1 summarizes the key components of the interventions and the major 
features of the research designs. Five of the programs were forms of resource 
allocation: the Cambridge-Somcrviile Youth Study (Powers & Witmer, 1951), the 
Chicago Area Project (Schlossman, Zcllman, & Shavelson, 1984), Project Redirec- 
tion (Polit & Kahn. 1985). the Pregnancy Prevention Program (Zabin, Hirsch, Smith, 
Streett, & Hardy, 1986), and the Resource Mother Home Visit Program (Unger & 
Wandersman, 1985). Each of these programs offered several kinds of activities, 
including counseling or other forms of social support and services, such as informa- 
tion and referral, recreation, or family planning. 

Two of the programs employed instruction as the sole or main component. They 
included EXTRA (Shelcy, 1984) and the Parent-Child Tutoring Program (Mehran & 
White. 1988). Four of the programs combined the involvement forms of instruction 
and allocation of resources: Career Beginnings (Cave & Quint. 1990). the Cities in 
School federal demonstration (C.A. Murray et al., 1980), the Peer Tutoring and 
Mentoring Project (Turkel & Abramson, 1986), and Project RAISE (McPartland & 
Nettles, in press). The PUSH-EXCEL federal demonstration (S.R. Murray et al.. 
1982) was characterized by all four forms of involvement. Also included among the 
studies reviewed is an evaluation of a group of school-based clinics, which are not 
connected to each other in a formal sense (Kirby et al., 1989). but which feature 
several forms of resource allocation. 

The instructional programs targeted elementary school students, whereas the 
other programs targeted students in elementary and secondary school. Although the 
studies are diverse in type and in the populations sampled, the sample can provide an 
overview of the kinds of effects that are possible for programs that have significant 
input from community entities. 



Community involvement for many years was viewed as good in and of itself. With 
the introduction of well-funded, multisite efforts and the concomitant inclusion of 
community involvement as a topic in discourse about school reform, program evalua- 
tion techniques were applied to assessments of community efforts. Although the 
literature still includes numerous examples of assessments the results of which are 
either ambiguous or so restricted as to be of use only for anecdotal purposes, the 
studies reviewed attempted to address methodological problems that are predictably 
associated with community involvement programs. Four of these issues— selection 
bias, fluctuations in sample composition, level of exposure to the treatment, and 
measures used — are discussed briefly below. 

Selection bias. Selection bias is a particular threat in research to community- 
related interventions, because students typically can select themselves as program 
participants. Such selection can be tied to a number of factors that may render 



Methodological Considerations 





IV -7 



Volume 1, No. 1, Summer 1993 




JC £0 

8 :I 

3 > 

C u 

C£ O 



3J 

2 J> 



OS £ 



o 
E 2 



u 



o c* 

E 
a 
U 



UJ 
00 



op -o 
c 



CJ TJ 
U C c 

- O *° 5 ^ W 



"> ~ 



OO C 

c c 



» u 

> 

: oo 



w u i 

_ 3 3 E o 

- 3 E 3 W _ 

£ ^ U w ° ^ ? 

c jz c <j 3 £ 

2 82^3-55 



o w - §■ . 

O p / - ^ Cl 

E = g £ s E 

c E o o 

o o ii .52 ° 

" 8.4 " S3 ~ 

3 o. r- 



5 « 

WD CO 



§ S O OO 



II 

o c 





de 






i— 






CJ 






> 










ej 






z 


nix 




c 


aj 




o 


he 


< 


o 


QU 


U 


ffl 







o ^.5 ^ | 
w o *E 3 5 

ej 



2 c 



c ■ C OOw- 

111*8 





c 

.2 


c 


5 « 


o 


&* ° 


CO 


o 1/5 


nsti 


CJ *o 



1/5 .J. 
CJ "O 

> - o 



£ a 



u ^ w 



Is a 
1 1 4 S 1 

CO O c "O 
L> br C o • — 
K Cu O w > 



E 

o 



E 

CO 

oo-S 

£"5 ■« 



"w CJ 



ON Q 

- 5 



V o 

c E 

j3 is 

CO CO 

N 0C 



1 2 v 



I 5 



cj tz: 
oo > 

- = 1 

IT, £ C 

- 

U 



o 



c y wi 

a s < e § 



-a j> 

t .3 CO 



Q_ 
E 



, CO CO 

CL D. 

E E ■ 

o o 



° ^ 



CJ 



3 

o 



CO ^ O . 

w E ° 

cj c — 

E _ > *3 o 

> s g g 5 - = 

- £ g « « o 

ejSi« - cj-— — c 

a « 60c 2 O O fl 

C --O-itco^-- c 

- J£ J. 3 o ^ £ « c - 

a O O *« ( < j;3 Crcj 

^j=j3 u ooP rati 
^cjcj-^Co 0 ^,"^ 

1/5 « > • — C CJ 



3 

E 
c 

o 
o 



c U 

is 

o 



CJ 



O - UJ 

o o n 

oo <u ' 

ra Cu U 



3 CJ 

? 2 



C 

o 



.2 U 
ra c S 
H «» c 

E « I 

3)S g OT ^ o 
oSu^cW a.iS 3*5 o.t: 



CJ CJ 



Cu ra 

E -H 
— c: 



c 

o 



I- *° !H 

JO 



o 



: ra o 
o U -° 



. ^ "71 

r'\ v " —1 y (/i (/) f"*^ 



ea « c 
co .CO 

2< o e 'S « 
u 3 I 00 



U 2. 

V) u Jm ' 

ec 3 a | *e 1 

. fl O g y _0 



5P S C 



c o 
3 "3 

o j« > ^ e g = 
au 5 « i- w ^ 



2 — c = 

O- ^ y ra -~ 

E J3 JC CO p 

O CJ OO o ^ 



J3 ^ 



OO ."3. -C 





"t/5 


JC 


spo 


CJ 


the 




Is 


u 


o 

JZ 


u 














jD 


icy. 


rtici 
wei 


o 


hbo 


Cu 


yed 




ra 




oo 




u. 


ue 


Cu % 
— in 
E w 


■o 

CJ 


o 


C 


QJ 
to 


cr 


J3 


c 


CJ 


eas 


c 


2 -a 


CJ 

ra 


o 
oo 




"o 


OO -O 


E 


ra 


o 






O CO 


CJ 




ra 



— rsi 



^ r*^ , 
ra _ O 
£> O u> 



Cu fo 
3 Cu 
O 



t oo y 
E £ -c 

o oo c H 

p y (0 ? 



Cu 

E 




CO 



FFICEo/ 

ESEARCH /V - « Vo/ume 7, No. J, Summer J993 



O > A — _' 

E w T* w t g 

M M n u - c 

c c >, 3 c ~ 

O O >- O br 2 



o CJ ^ 

u * £ c 

>* * 3Z 

u- £ 5/1 " 

o 5 °° r2 

•o OO.E 3 

C « w 00 

g o fc .SP 



« 00 

5^ £ u s y 2 

3C > > ™ O 3 

</! 0\ CO 

% ^ _o vn | «5 

— eo ~" <-> 

c — c *i w e 

O . -r= eo o e 

~ J* " J O 

« y ^ y c 

-= CO t? c 

Cl CQ O .-a £ O 



JC a c c « 

.sp c a p « 2 

? w £ C. w ^ 



JO — i/> 

?^ c :0 
■2 -D 



c 

E ^ 



73 u- 



E 

s 



^ u 



Q. 
3 

O 



' "~ i — o co oo k- 
on \c >» a. r- » 



a 
2 2 



.5 vc ^ = §• 2 

tvj « OS c c g p c 



a -s 



o - E 

5» J= S « 



on J- 
3 w 



2 « 15 _ 



u t: - c p 



-3 



.s"8 



— % 3 ^ O co y 

ca < £ £ u * -o 



s i 



g 8 - 1 

52 c5 x 

C k- o 



OO *- W) Q. 

O 3 



O _ 

ill 8 

^ a as 
5 s y c => 

c ^ 
3 



S e 



0 o c ^ 

CJ (/) C (/) 



CQ CO 

E o - g" 

co O p 



u O «-> 



1/5 «c •£ 

E U o 

CO . -O T3 

Q..E c 



00 <s> 

o *2 
«- o 

Q. 0 



25 



w T3 I 
•r O « 

.S "S E 

I s 



o .S 

§ 5 -s <a g 

u n £ y 

CO C _ eo « 

Q. 13 eo *- <— 

C O 3 C 

C t« x O eo 



! I a s ° 2 

• G £j c vi 

i op 3 oo 5 o 

. fJ CO O J "5 

; °- ^ « e 

, ^ fi L' « 



o Q 



U 0 



25 % 5 o eo ? 

c r 5 te « c 
o o o c co 



E .2 

O T3 
«- O 



OO eo ^ c 

co m .3 



c« 5 vs 

£ .£ % u § 

So ^5 ^> 

• T3 g 2 O 

" <U on " o Q. 



H li c O 

t: x o oo 

„ o 5: 3 c 

a a - a* 3 



»0 -S 7^ 



o ° 2 



II 



E ^ 

Silo 

P >^ o i2 
a. 3; -o c 

o | -SP E 
Owe 
£ c o 



a. i5 « 
& o - « 



£ "5.C 
« E c 



c ^ k _ 
n ^ -j 
S -E 2 § 



u K OO 1> w — 

E c: j= o 

i> to **" « o **■ 

is c c u £ 

. eo C 

j 2.5 s s & 



- 2 Si 5! 

list 



co 



co .2 E -2 
o o g g 

o 



si 



I 



tS o 
E « is 

CO 

00 



3 



IY-9 



Volume I, No, 1, Summer 1993 



T3 
4> 



C 

6 



£ >- 3 

E 



2 Z 



C 2 E 



£,.5 



* C. 7= ir - O 



- v2 



£ E 



5 c. g. 



E c. 



| ■£ 

ci ■- & cL 

£ a E 

I 2 c ° 

rap o 

« * - * 
»- jo 

C - t u 

w o : 

<y c > 5 

3 -5 E £ St 



5-5 8 



~ E 
E w 



5 * 



E « ^ 



* E >, 



> -ri r JZ c 



,2 -3 

Cu ^ 



I < 



2 

CD o" 

■git 

O 



Q 
2 



HI 

2 CO 



z 



o 
Z 



Z 

1/5 



O 

a. 

« 
c 



•s ~ * e 



111 

o £1 y 



lil 



3 CO 



c 

C 
3 C 

E ^ 

co t> 



1/1 

CO 5/5 



3 B 

i/i 3 



l 



co rN 
9 y 



O 

c .• 



*— i/S 



a 73 
t; 3 



E -5 



.a" a. 

I ° 

'J La 

"O CO 

3 c 



c « 5^ 

? u O >, 

— ?" Q. ~ 

< < 



O - 



* 



■M 



FFICEo/ 

ESEARCH IV - 10 Volume 1, No. 1, Summer 1993 




M 

■B =3 



"3 
5 = 



3- 

■ SJ 

s ? 

c c 



so 



^ E 



r3 «— 



v ~ E v 



CL ' 

J£ E 

C ?3 — — C 



e jc ~? f": 

c — 

H E 

V C 



c * 2 S c 
>r E 
E " 



■4 



■2 3 ■= •= 



-11 

"3 c r= 



E c 

C "3 



*S3 



7 

* > o 



c . O 

If "S 



C_ ,3 

£3 _ 

ob E cb 



81^ 



S 2 9 



> 2 5 c 



E "2 



3 

Z « e 
u °- 

E ° 
5 -3 . 

E r- 



-3 T3 



E £ 



12 

: -3 E c 



-a jd 



3 .2 ■ 



■ 5 o 

' 3 -F 



3 £ 



g -g .5 4 4 

w — -= 2 



5 * 



✓1 .3 



E 



■§1 



■5 ^ cl £ 

r u o o 

E .5 B <* 3 



U .52 c - 



o 
o 



o 
oc 



E B 



'•J -3 

"3 .y 



o °- 8 . 

O O C 

U C3 "3 

w M -3 

O ^ — *-> 

5 







c 






1 




T 


_c 










3° 






E 




cr. 




«C 






C. 






ej 


C 


3° 


c 




CJ 


> 


E 


< 





5 >> E 

« 1 8 



O .7= 



E *^ 



I "2 



'J 



U 3 73 



SI mj 

g g « 

8 "S 



C 3 ( 

- - C C O 

= ^ ra c 

3 ^ i_ H 



qo a. r- 



1 1 i -c c 

c c r: ^ c 

U X) r3 r - 



t: =3 



3 .£ -5 ^ 



E !P t» 2 



c 
E 

a 



^ E 

tl 

f3 v_ 



U E - 



- 5 c< 



E S 1 



3^-3 ^ £ C 



J <| 



3 *3 

O 



a 3 



a- vi: -= so 



o 1- 



'J 




--3 




U 


ICS 






c 




E 









s. a 




FFICEo/ 

^ESEARCH /V- // VWw/n* /, .Vo. /, Summer 1993 




■o 2 

SI c 
c CO 



3 
C 

c 



CO 



O 



c > * 



< >■ 



re - - o ^ 

>- m O O 



^ CO :/i 

HI 

g ° 5 

55E 



D. 3 

" O 



5 SE^< 



O 22 



2 c 



2 * O 



a 



5.2 o 

> ZZ u 



5 =s 

O 3 



3 



0 -c ■- 
~ c ° 

o o 

III 



12 . «a 

° o c 

y 2 c 
c . „ 

CO 

re 

I- 



c ^ 
w re 
§ E 



" 8 c 2 

5 re « 
re — "O 

E 3 



E 
o 
o 

E 



3 E 



£ s ■= E E 



re . 
E g o 



= 2 c. 

w O O 

O i- 

u a qo 



r2 «j 



re — 

c/5 



^ £■ u 

; ^ y 

i 2 -a * 

; £ 2 

m '3 ^ 

JL cj P 

-> * 

»- c c 

Woo 

2 TD SZ 



E 

- 0 I 



.5 o 

•s o 



s « S 

? ? c 

g " a . 

n y r ° 

x re b j= 



D. y *- 



1/5 gj 

O -O uT 



t: u « « 
a s - (a 



c o a 

i- 3 re 

§ ts E 
2 



5 « y - 



t> — 

E 



S .5 
o. re 

E E 



re tzz 
■Si u c 

c 3 s 

zz o 

ill 

E -e o 
E^-g 



P o 



u 

j= £J 5 w 
re 2 tj .t: 



£2 ° 



C J2 <s> O 

° . t o 

o -o 5 T »" 
c 3 0 c 

a ^ e r. 

| o § "g 

AS 8.1 



5 ^ 



re D. 5 



5 E 



.3 v> — 

re a. 

E g E 

o — re 

*- c ^ 

r- 4* ISi 

. re 

If i 



a. >^ 

E g> 




Of 



FFICEo/ 

^ESEARCH /V - 12 Volume J, No. /, Summer 1993 



5 5 - 



£ *— "Zl r- 



1 3 *« | i I 

^ sC "O 3 



z £ Z 5 



o 



-C n CO 



-« — a. c 



■a a. 



-3 C 

1 1 I 

E 5 5 

,— y. O 

° tr =P 

s .S -3 



^ y \= rr ~3 



9 3 



re jz 



c. e e ^ 

r. -a 



S 2 "3b 
're ^ "3 



1 § 

3 LZ 



~3 W 
2 5>S 



.E t 

CO 



u 5 ri 



E 

c 

CO 



* - i u 

y 5 w g 

c | ^ § 

-3 .s: -«= 



9 = S i 



y «/i 



"5 * 



'c C£l C 



CO 

bo 



y -3 «- w 



y re 



-Si e- ^ -S re 

re <. J*, re -> y 

y . y y, ~? a 

£ « r £ Jc -o 

— - e. jC 3 * UI 

w 5- " o c 

y to 00 COT) r 

= c a 7. c o t . 

a: ctf £ 



3 -S - 



C 3 
CO y 

c. c 



y 



co 00 r. 

— — y 



j ^3 



0* jg 



•s If § M » ^ 



-£ E 

5 o I 

re 

6 re 
re 



E , 



o. < S 



1 — "re F 



E ^ 

m ^ re 
=0-0 -5 
c co ^ 

£ M 

re »- —J 



* £ £ £ ■§ 

C O o o 
ra *3 re y -£ 



3 ^ y 

"2 £ 1 



"3 ~ ^ ' 
3 i 



CO y 
y ^ .S 



re w 



y CO > 



re t/i 5/1 - 

9 b a 5 



c 2 c 



5= .£ 

>C *- y 



32 Si 



< 



CO 



ERIC 



FFlCfc/)/ 
"ESEARCH 



/V-7J 



Vo/«me /, No. J, Summer 1993 




: : : ' x v/.w.-y/. 




comparison groups of nonvolunteers ipso facto nonequivalcnt. For example, students 
who elect to participate may be motivated to take advantage of opportunities (or 
have parents who are unusually motivated to return forms giving their children 
permission to participate). Moreover, program operators are often reluctant on 
ethical grounds to assign randomly to treatment and control groups the students that 
seek services or are otherwise eligible for them. 

In the studies reviewed, various approaches were used to control for possible initial 
differences between treatment and comparison groups. Random assignment to 
experimental and control groups from a pool of students eligible for the program was 
achieved in three of the studies. These included the two action research studies (the 
Parent-Child Tutoring Program and the Resource Mother Home Visit Program), 
where the investigators had authority over the design and implementation of both the 
program and the research, and the study of Career Beginnings, in which program 
administrators agreed to comply with the requirements for a rigorous research 
design. 

To achieve control of extraneous and other unwanted variance in evaluations of 
programs whose recruitment and selection procedures were nonrandom and unre- 
lated to the research designs, investigators used such approaches as comparison sites 
(Project Redirection), comparisons between levels of participation in the program 
(PUSH for Excellence), and multiple comparison groups, each designed to hold 
constant a given factor, or factors (McPartland & Nettles, in press). Many of the 
studies used statistical controls (such as baseline values of the dependent variables) as 
well. 

Changes in sample composition. Most of the studies that collected data at more 
than one point in time, or at some point subsequent to the formation of treatment and 
comparison groups, weie affected by the loss of participants from groups that 
constituted the original research sample. Two of the studies (Project Redirection and 
Career Beginnings) reported the results of tests of the representativeness of the 
responders to the original sample. There were significant differences in pertinent 
variables between respondents and nonrespondents in the Career Beginnings study, 
but the investigators concluded that the evaluation results were representative of a 
broad section of the original sample. The Redirection study used a two-stage statisti- 
cal procedure to analyze the effects of attrition and found them to be negligible. 

The foregoing examples addressed the issue of sample reduction due to respondent 
decisions. A related consideration is attrition due to program decisions about which 
students should remain in a program. It is not uncommon for program operators to 
reduce the numbers of participants to achieve a higher quality or a more manageable 
level of services. Such selection, which can violate the most elegant of research 
designs, occurred in the Project RAISE study, and the investigators therefore 
conducted separate analyses that compared the results before and after the student 
rosters were reduced. 

Level of exposure to the treatment. Level of exposure to treatment is an issue for 
research design and analysis in that the kinds of services and activities that constitute 
a formal program are often available in other settings to members of potential and 
actual comparison groups. For example, in the PUSH for Excellence study, a major 
component was motivational speeches by a highly visible and charismatic public 
figure. Not only did the leader present speeches in program schools but he was also 
featured on a number of nationally televised talk and news shows. 



o Office of 

ERIC ^ESEARC 



ESEARCH iv . U Volume 1, No. 1, Summer 1993 




Community Involvement 

In the Project Redirection and Career Beginnings studies, respondents in the 
comparison groups reported levels of the use of services comparable to those of 
program participants. The investigators addressed these unanticipated findings as 
issues in the interpretation of the results, Thus, in the Project Redirection study, the 
original question of the evaluation, which was whether any intervention was effective 
in assisting teenaged mothers, was modified to ask whether Project Redirection was 
more effective than other service models. 

Measures. In their assessment of the potential of community-based, after-school 
literacy programs, Davidson and Koppenhaver, authors of the Project on Adolescent 
Literacy ( 1988) , commented that measures of success for these programs differ from 
the standards used in school programs: 

Schools must attend to group objectives and standards, but after-school programs 
are free to focus intensively on individual goals. After-school programs deem them- 
selves successful when they can engage a young person on a continuing basis, 
promote success in some area of learning, excite interest in some aspect of reading or 
writing, and help the individual to see that literacy does have a place in his or her 
future, (p. 132) 

These authors recommended the use of qualitative measures of success that use as 
evidence such sources as program attendance records, student journals, and struc- 
tured observations and interviews. 

The studies in the present sample relied on measures that were used in evaluations 
of school and other institutional programs (e.g., school attendance, grades, stan- 
dardized test scores, police contacts). Interviews were used to capture data on 
aspirations and plans, sense of efficacy, contraceptive use, and other self-reports, but 
structured observations were rare. 

The different kinds of measures may be grouped into three categories. The first is 
investments, defined as students* commitments of their time, energy, and other 
resource? in pursuit of legitimate opportunities that will yield a future return 
(Schwarz, 1980: Nettles, 1989). The focal investments in the studies reviewed were 
attending school, using contraception, participation in extracurricular activities, and 
working after school, or part-time. The second category includes measures of atti- 
tudes, such as sense of efficacy and attitudes toward school. The final category 
embraces measures of attainment, such as test scores, grades, promotion, and school 
completion. 

Virtually all the studies addressed student engagement in the program through 
measures of level of exposure to the treatment. As Table 1 shows, the programs 
varied widely in the level of continuing participation, or use of services. 

Effects on Individual Students 

The programs sought positive changes in students enrolled as official participants 
and recipients of services, in entire schools, or in entire neighborhoods. The follow- 
ing section addresses the findings at the individual level of analysis. 

Academic outcomes. Positive effects on reading skills were sought in five pro- 
grams: each of the tutoring programs, Project RAISE, and Cities in Schools. Al- 
though significant improvement from pre- to posttest was found for program partici- 
pants in the Cities in Schools evaluation, in the absence of a comparison group, 
factors other than the program may have influenced the results. The findings of the 
study of the Parent-Child Tutoring Program indicated that substantial gains on all 
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measures were obtained for students whose parents participated at planned levels. 
For example, among pairs in which the experimental parents implemented the 
tutoring according to design, the average effect size was .96 across three tests (the 
Comprehensive Test of Basic Skills, the Woodcock Johnson Psycho-Educational 
Battery, and the Harrison Criterion Referenced Test) approximately nine months 
after the intervention. No significant differences were found between program 
participants and comparisons in the remaining four studies. 

Gains in mathematics skills, which were the goals of two programs (Project RAISE 
and EXTRA), were obtained only in the EXTRA program. Students in the tutoring 
program showed greater improvement on the math subscale of the Comprehensive 
Test of Basic Skills in 64% of pairs matched according to age, grade level, course 
grades, test scores and teacher. However, matched-pair differences in both reading 
and mathematics were related to gender and tenure in the program. Among long- 
term pairs, girls in the program showed greater improvement over their control twins 
than boys did with respect to their matches. Differences in mean score changes were 
26.67 and 83.29 for boys and girls respectively in math and 14.25 and 78.00 for boys 
and girls respectively in reading. 

Improvements in grades were examined in the studies of Peer Tutoring and 
Mentoring, Project RAISE, Cities in Schools, and the PUSH-EXCEL Project. In the 
latter, level of participation in the program was positively associated with grade point 
average after the effects of preprogram grade point average were taken into account. 
Students in Project RAISE received better English (but not math grades) than other 
students in the same schools. The size of the positive program effect was .14. 
However, the RAISE students' English grades remained below the average for the 
school district. In addition, there were no significant differences in grade point 
averages between program participants and comparisons in the evaluation of the Peer 
Tutoring and Mentoring project. 

Attendance. Improved school attendance was sought in four programs (Peer 
Mentoring and Tutoring, Cities in Schools, the PUSH-EXCEL Project, and Project 
RAISE). In the studies of Project RAISE and Cities in Schools, significant differ- 
ences in the anticipated direction were found between comparison students and 
program students in the same school and grade. For example, the reduction in 
absences due to participation in RAISE was nearly 3%, which translates into about 
one week of extra days of attendance in a 180-day school year. Among students in one 
of the cities in the Cities-in-Schools program, there was nearly a 7% decrease in 
absences in the 8th to 9th grades compared to no decrease among comparison 
students. No significant effects were found in studies of the other two programs. 

Persistence in school. Five of the programs (Cities in Schools, Project Redirection, 
Project RAISE, Career Beginnings, and the Resource Mother Home Visit Program) 
sought to induce students to remain in school or to make satisfactory progress toward 
graduation from high school or from a postsecondary program. The evaluations of 
three of the programs found that the programs were effective in this regard. The 
Project Redirection study found that, 12 months after the program began, a signifi- 
cantly higher proportion of the participant group than of the comparison group was in 
school or had graduated (56% compared to 49%). At eight months postpartum, a 
higher proportion of mothers in the visited group of the Resource Mother program 
remained in school than their counterparts in the comparison group, and a signifi- 
cantly higher percentage of participants (47.9%) than of controls (43.4%) in the 
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study of the Career Beginnings program was in college after one year. The results for 
these three studies are particularly credible because the research designs in the 
evaluations were stronger than in the others reviewed here. 

Other short-term effects. The studies also produced evidence that community- 
initiated, or community-operated, programs can produce desirable effects on stu- 
dent attitudes toward the self and school and on risk-taking behavior. For example, a 
heightened sense of personal efficacy was associated with participation in the Cities 
in Schools and the PUSH-EXCEL Project evaluations. Students in the Peer Tutoring 
and Mentoring Project had more positive attitudes toward school (as measured by 
pre- and postprogram schools on the Quality of School Life Scale). 

In the evaluation of school-based clinics, in three sites in which clinic users were 
compared to clinic nonusers and where clinics made contraceptives available to 
students by dispensing them or by providing vouchers, students who elected to use 
the clinics for contraception were more likely to have used contraceptives at last 
intercourse than students who did not use the clinic for this service. Similarly, in the 
study of Project Redirection, one year after the program began, a greater proportion 
of participants than of comparison subjects reported contraceptive use at last inter- 
course Moreover, during the first year of the study, a significantly lower percentage 
of project participants than of comparison subjects had a repeat pregnancy. There 
were no differences in employment status between the groups at 12 months, although 
at the 12-month interview the proportion of participants ever employed was greater 
than that of comparison subjects ever employed. 

Long-term effects. Of the 13 studies reviewed, three of them (the evaluations of the 
Cambridge-Somerville Youth Project, of Career Beginnings, and of Project Redirec- 
tion) measured effects subsequent to respondents' participation in program activ- 
ities The effects on participants' persistence in college achieved by the Career 
Beginnings program were mentioned above. In the Projection Redirection study, 
measures were taken at 12 months, when the average participant was ending her 
involvement in the program, and at 24 months. Overall, differences found at 12 
months between comparison participants and program participants in subsequent 
pregnancy, employment status, and school enrollment, or school completion had 
vanished by 24 months. Exceptions to this were found in analyses of subgroups: 
Desired effects were sustained for program participants who were extremely disad- 
vantaged relative to other participants. 

The longitudinal study of the Cambridge-Somerville Youth Project suggests that 
intervention models may provide short-term, but not continuing, amelioration of the 
problems associated with disadvantage. Because this insight is essential for under- 
standing the effects of interventions, the program and the original research design 

are briefly described. . . • 

Dr Richard Clark Cabot initiated the Cambridge-Somerville Youth Project in 
Massachusetts in 1935 in an effort to curb delinquency. The police, churches, 
schools and social service agencies recommended boys aged 5 through 13 for 
participation in the project. Boys considered to be "average" and "difficult" were 
identified and paired according to delinquency prediction scores and personal and 
home background. One member of each pair was assigned to the control group and 
the other to the treatment that began in 1939. For approximately 5 years, the project 
arranged for academic and medical services as needed, linked the boys to youth and 
other community organizations, and sent one fourth to summer camp. Twice a 
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month, counselors visited the boys 1 families. The initial evaluation of the Cambridge- 
Somerville Youth Study found no significant differences in official delinquency or in 
social adjustment between treatment and control groups (Powers & Witmer, 1951). 
However, among pairs whose treatment twin received services that removed barriers 
to adequate socialization, the treatment group had a higher level of social adjust- 
ment. 

In 1975, McCord (1978) traced the whereabouts of 506 men who had participated 
in the treatment and control groups. Of the 480 who were located, 48 had died. 
Questionnaires were sent to the remainder (208 in the treatment and 202 in the 
control) and data from the Massachusetts archives on the 340 men still living in the 
state were examined. 

The men in the treatment group rated the program very positively, but the analyses 
indicated that the program had no subsequent impact on delinquency. Moreover, the 
participants in the treatment group, compared to those in the control group, tended 
to (a) show more signs of mental illness, (b) have had at least one stress-related 
disease, (c) be in low prestige occupations, (d) show signs of alcoholism, (e) report 
more often that their work was less satisfying, and (f) have committed a second crime. 
There were no statistically significant differences between the treatment group and 
the control group in 50 other comparisons. 

Subsequent analyses (McCord, 1981) indicated that participants who had partic- 
ularly long, early, or frequent contact with the counselors showed the strongest 
adverse impact. After testing several interpretations of the effects (e.g., that the 
program increased dependency), McCord concluded that the program 

seems to have raised the expectations of its clients without also providing the means 
for increasing satisfactions. The resulting disillusionment seems to have contributed 
to the probability of having an undesirable outcome, (p. 405) 

Effects on School and Area Populations 

In 4 of the 13 programs, the entire school, or entire neighborhood, was the target of 
intervention, rather than students who were recognized as official participants. The 
program included the PUSH-EXCEL Project, the Chicago Area Project, the Preg- 
nancy Prevention Program, and six school-based clinics. 

School attendance. Changes in school attendance were examined in the studies of 
the PUSH-EXCEL Project and the school-based clinics. In the five schools in the 
PUSH-EXCEL study, absences decreased slightly (1-2 percentage points) in all 
schools concurrently with the implementation of the program . In three of the sites in 
the evaluation of school-based clinics, there were no significant differences found in 
days absent due to illness in clinic schools versus comparison schools that did not have 
clinics. However, in two schools that opened clinics after the evaluation began, 
students missed fewer days after the clinics opened than they had before the clinics 
opened. 

Pregnancy prevention and reduction of high-risk behaviors. The evaluation of the 
Pregnancy Prevention Program assessed changes over time in students* sexual behav- 
ior, contraception, pregnancy rates, and knowledge and attitudes about reproductive 
issues. The investigators used data from school archives and from surveys conducted 
in the two project schools and in the two control schools. Young women who 
attended the school concurrently with the three years of program implementation 
initiated sexual intercourse a median length of seven months later than those who 
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attended the school prior to program implementation. In program schools, as com- 
pared to control schools, there were reductions in rates of unprotected intercourse 
and pregnancy. 

The evaluation of six school-based clinics found that the clinics had no impact on 
schoolwide pregnancy and birth rates. Nevertheless, at two of the sites, significantly 
greater proportions of students in clinic versus nonclinic schools (75% compared to 
61%) reported using contraceptives at last intercourse. In addition, significantly 
greater proportions of students in pre-clinic schools (66% before the clinic opened 
compared to 75% 2 years after the clinic opened) reported using contraceptives at 
last intercourse. Alcohol consumption and cigarette smoking were measured in four 
sites. Students in three clinic sites reported lower alcohol consumption, and students 
in one of the four clinic sites reported lower cigarette smoking than students in 
nonclinic schools. 

The evaluation of the Chicago Area Project examined areawide reductions in three 
types of delinquency: runaway and ungovernable behavior, police contacts, and 
arrests. The analysis compared the discrepancy between predicted and reported 
rates in six neighborhoods: two program neighborhoods (identified by matching 
individual participants' addresses to specific areas in South Chicago) and four com- 
parison neighborhoods (the remaining neighborhoods in South Chicago). In one 
program neighb >rhood, all measures of delinquency were lower than expected. In 
the second program neighborhood, delinquency was lower than expected on all 
measures except male arrests and police contacts. Two comparison neighborhoods 
showed higher rates than expected, and two (one of which was served by a commu- 
nity center not in the program) showed lower rates. 

Summary 

Table 2 summarizes the direction of short-term effects found in the studies re- 
viewed according to form of involvement. It is clear that the programs can have 
positive effects on school-related behaviors and achievement as well as on attitudes 
and risk-taking behavior. Within types of effects, the consistency of positive out- 
comes for attendance, pregnancy status and contraceptive behavior, and persistence 
in school suggests that community programs may be potentially useful interventions. 
Overall, the effects range from small to substantial. This is not surprising given the 
variations in level of exposure to treatment and quality of research design. 

Also of note is the pattern of outcomes by involvement types. Programs that fall 
either in the allocation or the instruction categories tend to show an overall pattern of 
positive effects. Programs that combine allocation and instruction show a mixed 
pattern . There is only one program that combined the four forms of involvement , and 
thus there is no basis for comment on patterns. 

Directions for Future Research 

The literature on the effects of community involvement on disadvantaged students 
is predominantly a literature of program evaluation. The studies reviewed above 
should be considered in light of their occurrence in the history of evaluation research. 
The critiques (Farrar & House, 1983; Stake, 1983) of the earliest studies of grass- 
roots involvement, the Cities in Schools and the PUSH-EXCEL Project evaluations, 
suggested the need for evaluators to respect the essential and unique character of a 
given intervention, to employ compatible methods in assessing effectiveness, and to 



9f 



2 3 r 



FFICEo/ 

" ESEARCH IV - 19 Volume 1, No. 1, Summer 1993 





FFlCEo/ r\j 
RESEARCH IV - 20 Volume 1, No. J, Summer 1993 



Nettles 



maintain clear boundaries between evaluation and formulation of program policy. 
Recent studies, such as the evaluation of Project Redirection and the evaluation of 
school-based clinics, reflect advances that have been made in design analysis, in data 
analysis, and in the increased cooperation among program implementers, funders, 
and evaluators in the planning and implementation of research. The establishment of 
evaluation networks (e.g. , the National Network of Teen Pregnancy Programs Doing 
Impact Evaluations), information on evaluation targeted at program operators (see 
Philliber. 1989, for an illustrative handbook), and the application of approaches that 
contribute to program development (cf., Gottfredson, 1984a) bode well for continu- 
ing progress in the study of the effects of community involvement. 

However, a major weakness in the existing evaluation research is the lack of 
attention to the effectiveness of components of programs. Research is needed to help 
program developers confront two difficult challenges: identifying effective practices 
from among the scores of programs that now exist and fostering student participation 
in program activities. Studies to increase participation levels, for example, might 
focus on structuring incentives that meet the following criteria: Incentives should he 
appropriate in terms of the students' developmental level, abilities, and interests; 
incentives should be inexpensive; and incentives should not undermine program 
goals or community and family norms, values, or resources. 

Two major gaps are apparent in the general field of community involvement 
research. One is the dearth of studies on the quality and effects of naturally occurring 
and institutionalized occurrences of the four forms of involvement. Also needed are 
studies that answer questions about the relationship between intervention and infor- 
mal involvement. For example: Does planned involvement facilitate or impede 
naturally occurring community processes? What is the nature of informal instruction 
in local' businesses, churches, settlement houses, and youth organizations? Can 
planned support improve informal practices? What factors stimulate, or impede, 
involvement? 

The second major problem with the existing research on community involvement 
is its conceptual isolation from research on how communities and other ecologies 
affect disadvantaged students. The programmatic efforts and studies included here 
were guided by highly specific practical or theoretical rationales, rather than by a 
general conceptual framework. These specific rationales are reflected in the narrow 
range of questions answered in the research, the focus on indicators of school success 
and on adjustment to the exclusion of other measures, and the absence of informa- 
tion that would be helpful in designing effective treatments and strengthening infor- 
mal practices. This article proposes a conceptual framework to assist boih practi- 
tioners and researchers in integrating existing studies in varied disciplines, in 
sketching an outline for new lines of inquiry, and in identifying new directions for the 
design of interventions. 

This conceptual framework integrates three separate lines of research. The first 
assesses community competence, which refers to the capacity of a community and the 
agents within it to solve problems and to meet the demands of daily life (Barbarin, 
1981 ; Iscoe, 1974). Communities that function well are in some respects the counter- 
parts of effective schools. This research literature suggests that competent commu- 
nities are characterized by such features as responsiveness to the diverse needs of 
members, maximized use of resources, cohesiveness, and a collective sense of well- 
being, of physical security, and of opportunities for individuals to achieve status and 
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receive recognition for accomplishments. These and other characteristics can be 
defined as one of three components — namely: 

1. community structure, which embraces physical features, social area character- 
istics, and other aspects of the community's resources; 

2. community culture , or climate, which is defined by values, standards, and rules; 
and 

3. community processes such as problem solving and allocation of resources 
(Hurley, Barbarin, & Mitchell, 1981). 

This framework encompasses the two most common meanings of community. One 
refers to community as a locality, such as the neighborhood, the city, the block, or the 
catchment area of a school. The other meaning views community as the social 
interactions that occur in formal and informal settings within, and across, locales. 
(See Heller, 1990, and Newmann & Oliver, 1969, for discussions of these meanings 
and their implications.) Understanding these dimensions of community will contrib- 
ute to research in three ways: It will help address the neglect of processes that occur 
naturally in communities, aid in the specification of connections between community 
characteristics and community involvement, and give researchers insight into how 
local variation, which has not been addressed systematically in the literature, shapes 
the direction and intensity of involvement. 

The second line of research evaluates the influence of educational environments on 
student development — how the structure, social climate, and processes of class- 
rooms, schools, dormitories, families, and other institutional settings affect student 
performance, aspirations, attitudes toward school, delinquency, and other behav- 
ioral and cognitive outcomes (Astin, 1968; G. D. Gottfredson, 1984b; Moos, 1979). 
In Moos' (1979) model, students' cognitive, motivational, and coping mechanisms 
link the environmental system (i.e., structure and climate dimensions) and the 
student's personal system (i.e., sociodemographic characteristics, personality, and 
skills) to changes in student values, interests, aspirations, and achievement levels. 
Thus, studies of community involvement should account for student characteristics 
(beyond economic status or academic achievement, two common factors that are 
already used to distinguish subgroups) that may mediate the effects of involvement. 

The third line of research includes studies of involvement, such as the evaluations 
reviewed above, as well as case studies and other forms of research. These studies 
have begun to specify the important student variables that can be influenced by the 
actions of community entities. 

Figure 1 shows the proposed framework. It defines the community as an environ- 
ment characterized by three measurable features: structure, culture or climate, and 
the involvement process. Structure and climate are aspects added to the typology of 
community involvement developed previously and used throughout this article. 
Community structure refers to the nature and organization of the social units and 
physical features within the community's boundaries. Four dimensions of community 
structure dominate in the literature as important targets of involvement or as factors 
affecting student outcomes: the educational resource base, history, social area 
characteristics, and the physical setting. Several studies indicate that structural 
characteristics have a direct effect on student attainment (for reviews, see Mayer & 
Jencks, 1989; Scott-Jones, 1989). The line connecting structure to involvement is 
suggested by case studies of partnerships and citizen participation, which indicate 
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FIGURE 1. Framework for examining community involvement and student progress 



that community history and the level of resources shape the form and direction of 
involvement. 

Community climate consists of the values, norms, and rules that serve to maintain 
community order and control, to promote extensive social interaction among com- 
munity members, and to facilitate individual community members' grov/th and 
progress. This aspect of community has been explored largely through ethnographic 
studies (e.g. , Lightfoot, 1978; Anderson, 1976). These studies may be useful starting 
points for operationalizing cultural elements that influence involvement and student 
development. With regard to the latter, Ogbu's (1985) cultural-ecological model of 
inner city childrearing and development specifies the competencies that African 
Americans in the inner city expect children to acquire and the cultural factors that 
shape the tvpe and content of such competencies. Thus, this model suggests that 
climate may have direct effects on student outcomes. Case studies of local organizing 
in the PUSH-EXCEL evaluation (Kumi, Thompkins, Allen, Murray, 1979) suggest 
that climate is an important influence on the level of mobilization for school improve- 
ment and. hence, on the line connecting climate to involvement. 

Community involvement is conceptualized as the typology of the four involvement 
processes used in this review. These processes, singly and in concert, comprise the 
formal and informal actions that individuals and groups undertake either directly to 
foster student development or indirectly to improve or to reform institutions that 
serve youth. As this review suggests, the involvement of community actors can 
stimulate student investments such as attending school, using contraceptives, and 
avoiding high-risk behaviors such as alcohol consumption and delinquency. Attitudi- 
nal shifts and heightened achievement are also outcomes of some forms of involve- 
ment. However, different types of community involvement may not be equally 
effective in producing results. 

Conceptualizing community in this way focuses attention on the aspects of commu- 
nity that affect the intellectual and psychosocial development of children and youths. 
By distilling what is already known about community environments and their effects 
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on students, by implementing ambitious action research designs in program evalua- 
tions, and by exploring connections between the various aspects of community, 
investigators can contribute to practical and empirical knowledge about ways to 
remove impediments to the progress of disadvantaged students and can create 
environments that nurture these students. 
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Using Community Adults as 
Advocates or Mentors for At-Risk 
Middle School Students: A Two- Year 
Evaluation of Project RAISE 

JAMES M. MCPARTLAND and SAUNDRA MURRAY NETTLES 
Johns Hopkins University 

The effects on selected student outcomes are evaluated after two years 
of operation of Project RAISE, a mulufaceted approach featuring outside 
adults as school-based advocates and one-on-one mentors for at-risk 
students at seven middle schools. Positive effects are found on improving 
student attendance and report card grades in English, but not on pro- 
motion rates or standardized test scores. The effects, though sizable, 
were not sufficient to neutralize the academic risks with which students 
entered the program. The positive results were primarily due to three 
of the seven sites. Some evidence supported interpretations that, although 
strong one-on-one mentoring is not an essential component of an effective 
program that uses outside adults to assist at-risk middle school students, 
the RAISE model is much more likely to show positive effects when 
one-on-one mentoring has been strongly implemented. Success may also 
depend on the size and composition of the student group to be served. 
Issues are raised about roles and responsibilities of adult advocates or 
mentors. 

Schools throughout the nation are engaged in programs that use adults 
from the community to help a't-risk youth make steady progress through 
the middle and secondary grades and complete high school. Two 
general approaches — mentoring and advocacy — are widely viewed as 
promising mechanisms to provide sustained, goal-directed support to 
students. 

Mentoring is commonly defined as a one-to-one relationship between 
a caring adult and a student who needs support to achieve academic, 
career, social, or personal goals. Mentor-student relationships can de- 
velop naturally or within structured interventions through activities 
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designed to arrange, sustain, and monitor matches. Advocacy, as the 
term is currently applied, refers to a supportive relationship wherein 
a resourceful adult (who may be called an advocate, program coor- 
dinator, youth worker, or counselor) works with the same group of 
students over a specified period of time and provides intensive in- 
strumental, material, and emotional support that can include assessing 
students' needs for academic and social services, intervening on the 
students' behalf in schools and other institutions, monitoring students' 
participation in programs, and identifying and brokering formal services. 

Both of these approaches are extremely popular, not only for at- 
risk youth but for other populations as well. For example, a directory 
compiled by the New York State Mentoring Committee (1989) lists 
over 21 1 mentoring programs in New York State alone. A recent 
survey of college and university tutoring and mentoring programs for 
disadvantaged youth (U.S. Department of Education 1990) reported 
that, of an estimated 1,701 such programs, 63 percent provided men- 
toring and 17 percent had mentoring as the primary focus. 

National mobilizations are under way to promote further the use 
of these approaches. For example, there is the growing number of 
activities under the "I Have a Dream" Foundation. As of October 
1 989, the program was under way in over 30 cities in 23 states (Berger 
1989; J. M. Sesnick, personal communication, December 6, 1989). 
Another major effort is One to One, which has the goal of matching 
with a "caring partner" every young person who can benefit from such 
a relationship. This is to be accomplished through the formation of 
local leadership councils, pilot neighborhood projects, and the National 
Mentoring Partnership (One to One 1990). Big Brothers/Big Sisters 
of America has a long history of providing young people with adult 
volunteers in one-to-one relationships. The organization has nearly 
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500 affiliates throughout the nation, and the children it serves include 
a large proportion from low-income or single-parent families (Smink 
1990). 

The widespread use of terms such as "mentoring" and "advocacy" 
and the prominence in the media of testimonials about how kids have 
been "turned around" through contact with a mentor or advocate give 
the impression that these are well-defined approaches that effectively 
increase students' motivation and achievement in school, remove barriers 
to student progress in school and the wider community, and help 
students refrain from self-destructive and illegal actions. In fact, there 
is great overlap in the practices that bear these labels, and the labels 
themselves may be used interchangeably. The content of adult roles 
and relationships with students, as well as the desired outcome, varies 
considerably from program to program (see Flaxman et al. [1988] for 
a review). Within programs, the intensity of relationships may also 
vary. Freedman (1988) examined the quality of mentoring relationships 
in five programs that provided at-risk youth with "elder mentors." 
Three tvpes of one-to-one relationships were observed: primary re- 
lationships, which were characterized by a high degree of attachment, 
trust, importance, and enjoyment; secondary relationships, which ex- 
hibited the same characteristics as those found in primary relationships, 
but in a less developed form: and nonsignificant relationships, pairings 
that were marked by distrust and distance. 

Research on the effects of mentoring is scant. The available infor- 
mation suggests that mentoring can be a useful but modest approach 
for addressing students' needs. According to Flaxman et al.'s (1988) 
review' of the literature, the goals for the relationship should be clear 
and within the mentor's power to achieve, and the mentor must be 
empathetic. able to assess accurately the needs of students, and able 
to applv resources appropriately and regularly. Research on the ef- 
fectiveness of advocacy is also rare; the few studies that exist (see, e.g., 
Murray et al. 1981 ; Unger and Wandersman 1985) again suggest that 
this form of support works best when the adult role is structured 
around a few well-defined objectives that will in turn help the student 
to undertake specific actions. 

However, few if any of the existing projects have been accompanied 
by quantitative evaluations that include carefully constructed comparison 
groups, statistical controls on initial student input differences, and 
statistical tests of effects on major student outcomes after a reasonable 
period of program operations (Flaxman et al. 1988; Smink 1990). To 
provide an empirical foundation for a discussion of programs that use 
adults from the community to assist the school success of at-risk youth, 
this report shows the results after two years of Project RAISE, a well- 
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financed, multi-faceted program pursued by seven different community 
sponsors within the large urban school district of Baltimore. 

RAISE Project Components and Samples 

RAISE started in May 1988 with seven community sponsors that each 
made a seven-year commitment to provide support to groups of ap- 
proximated 60 at-risk students, beginning from the time they enter 
grade six and following them through subsequent middle and high 
school grades. The sponsors include two churches (one predominantly 
black and one predominantly white), two universities (one predominantly 
black and one predominantly white), two large businesses (both pre- 
dominantly white), and one fraternity (predominantly black). According 
to project materials, "the basic RAISE strategy is to create on a large 
scale the kind of sustained caring connections which can make a dramatic 
difference in the lives of very high risk children." RAISE expects to 
improve students' self-esteem and school-related behavior and progress, 
and to reduce high-risk behaviors such as substance abuse and teenage 
pregnancies. 

Key components of the RAISE model include a full-time director 
and support staff who provide overall program development and 
administration for the set of seven sponsors; paid school-based advocates 
for each of the seven sponsors; and volunteer one-on-one mentors 
for each student served by the sponsors. The director and support 
staff are located at the Baltimore Mentoring Institute, a nonprofit 
agencv created to manage a number of related activities in the city. 

RAISE is funded by significant grants from two major local foun- 
dations and by annual contributions from the seven sponsoring or- 
ganizations that together will total about $2 million over the seven- 
year project period. The RAISE combination of paid and volunteer 
components is intended to be at levels that could be widely replicated 
elsewhere if the project is proved successful. 

The seven sponsors vary significantly in the degree to which they 
have implemented the RAISE model during the first two years of the 
project, which are summarized in table 1. The seven paid school-based 
advocates have worked from the outset with each sponsoring orga- 
nization to serve RAISE students as "part counselor, part friend, and 
role model." The advocates' job includes monitoring attendance, grades, 
and behavior, building a relationship of trust with each student, and 
troubleshooting for individual students when necessary. All sponsors 
have recruited some volunteers to assist the advocates with after-school 
activities such as tutoring and recreation and with periodic events such 
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TABLE 1 



Differences in RAISE Components by Sponsor 









RAISE C.OMPONKNT 






Advocate 




Subsample 


Grade 5 


Core 


SPONSOR 


Established? 


Mentors 


* Selected? 


Reading Score 


Size 


A 


Yes 


■*■ 


No 


4.78 


56 


B 


Yes 


0 


No 


5.38 


41 


C 


Yes 


0 


Yes 


5.18 


44 


D 


Yes 


■f 


No 


4.29 


57 


F. 


Yes 




Yes 


4.89 


49 


F 


Yes 


+ + + 


Yes 


5.73 


43 


(i 


Yes 


+ + + 


No 


5.20 


44 



* Entries indicate that mentors have been provided for no students (zero), no more 
than about one-third of students <-*•). about ni.c-ualf of students (++). or all students 
(*+• + 



as museum or /.oo visits, attending athletic events, roller skating, or 
going to the movies. However, not all sponsors have established the 
one-to-one mentoring component of the RAISE model. 

One-on-one mentoring is a particularly demanding component of 
RAISE that few of the sponsors have as yet been able to establish at 
a high level of implementation. The expected mentoring relationship 
with an individual student is one of sustained caring and attention by 
the adult volunteer. Although mentors may help with a student's ac- 
ademic or personal problems, the expected role is different from that 
of a tutor, professional counselor, or social worker (Flaxman et al. 
1988). To achieve a strong supportive mentoring relationship that 
builds students trust and provides effective role models for positive 
personal development by the students, RAISE expects a strong com- 
mitment of time and energy from the adult volunteers. The mentors 
must commit at least one vear of weekly contacts that include biweekly 
face-to-face meetings. Mentors are provided with orientation and on- 
going training by RAISE staff and are given regular information by 
the paid advocate about their students' programs and performance in 
school and elsewhere (Baltimore Mentoring Institute 1990). 

One-on-one mentoring has been well established for each RAISE 
student by two sponsors (table 1, F and G). although one of these two 
took until the beginning of the second year of the project to achieve 
its high level of implementation. Three other RAISE sponsors have 
established one-on-one mentoring for some but not all student par- 
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ticipants, ranging from matching about half Of the students with mentors 
in one case (E), to having mentors for about one-third of students for 
the past half year in another case (A), to reaching less than ten students 
with active mentors in the third situation (D). Two sponsors do not 
have one-on-one mentors for any RAISE students but use their adult 
volunteers to assist with group activities or to work with different 
individual students on different occasions. 

Another important way that some sponsors have deviated from the 
original RAISE design involves changes in the samples of students to 
be served. The original seven groups of students were identified for 
each sponsor by designating seven elementary schools in some of the 
city's most impoverished neighborhoods from which the students com- 
pleting grade 5 in May 1988 would be eligible for RAISE. The initial 
design was for each sponsor to work with a group of about 60 students, 
which was thought to be an upper-limit case load for each school- 
based advocate and for sponsors to recruit and train volunteer mentors. 
However, the number of students eligible for RAISE from the designated 
feeder schools was often much greater than 60, and the sponsors coped 
with this greater number in different ways. Three sponsors accepted 
all the eligible students, even though their actual numbers of RAISE 
participants then ranged from 75 to 80. One sponsor who should have 
served 75 eligible students wound up with an actual RAISE sample 
of 67 because of apparent clerical errors in providing school lists, but 
no apparent biases were created by the sample reduction. Three other 
sponsors purposelv eliminated about one-third of the students from 
their pools of eligible students (table 1, "Subsample Selected?"). Their 
original samples had ranged in size from 85 to 99, but subsamples 
were selected to achieve in each case a final group of about 60 students 
as actual RAISE participants. The process of selection in these three 
cases was not random, but based on which students showed most initial 
interest in the project, as reflected in two cases by providing home 
information of interest and in the third case by being active in the 
early months of the project. Our tabulations show that the eligible 
students who were eliminated by these three sponsors tended to have 
lower fifth-grade test scores and higher absence rates and were more 
likelv to be male and to be designated as special education students. 
This non random elimination of some eligible students introduces some 
complications to the original evaluation design. 

The final actual samples for each sponsor differed in two other 
important ways that derived from initial differences of students and 
their elementarv feeder schools. Average grade 5 reading scores for 
students differed by sponsor at the end of the school year when students 
first were selected for RAISE, with a difference of nearly one and one- 
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half years in grade-level equivalent scores between the sponsors with 
the least well prepared and best-prepared students (D and F). Sponsors 
also differed in the degree to which their 60 students were dispersed 
across different middle schools, because of elementary school feeder 
patterns. Table 1 shows the number of students in the "core" middle 
school that included the largest number of students for each sponsor. 
Students in core schools should be easiest to serve because of their 
location at the site where the full-time paid advocate resides for each 
sponsor. 

Each factor may have some influence on the effectiveness of each 
sponsor's RAISE programs over the first two years, which we will 
evaluate along a number of student outcome dimensions. 

Evaluation of RAISE Effects 



Because some RAISE sponsors eliminated students from the original 
pool of eligible participants, we adopted the following strategies to 
establish comparison points to evaluate possible effects of RAISE on 
student outcomes. First, we restricted our attention to comparisons 
between RAISE and non-RAISE students who are attending the same 
middle school, leaving the students who had been dropped by RAISE 
sponsors out of both comparison groups. Since each sponsor has a 
core middle school that most of their RAISE participants attend and 
where the paid full-time staff advocates are located, we focus on RAISE 
students who are most likely to receive the strongest assistance from 
the program. By omitting from the analyses all students who had been 
dropped from RAISE, we do not penalize the non-RAISE comparison 
groups with any potential negative bias of the individuals eliminated 
from RAISE. Thus, we are using students who began grade 6 at the 
same time in the same core middle schools to compare RAISE par- 
ticipants with other students who were not eligible for RAISE because 
their elementary feeder schools were not originally selected. 

Second, we statistically control for a number of key student input 
variables with which students entered grade 6. These variables include 
grade 5 spring scores on standardized tests in reading, math, and 
language arts; student's sex and race; and student's age in grade 5, 
which indirectly indicates whether an individual had been left back 
one or more times in elementary school. Statistical controls are achieved 
through multiple regression analyses on selected student outcomes 
that include these input variables as well as a zero-one code for whether 
a student is enrolled in RAISE. 
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Our strategy of restricting attention to students in core middle sc hools 
will also hold constant many features of school policy and staffing not 
associated with RAISE, because RAISE and non-RAISE students in 
the same schools will have similar school programs outside RAISE. 
This initial strategy also aids in controlling for student inputs, because 
each core middle school draws from a defined geographic attendance 
area. All of our approaches, however, cannot completely discount possible 
bias in the analvses due to some RAISE sponsors' elimination of some 
student participants from the original eligible pool; thus we will also 
note any residual relationships with sponsors selection practices. 

Program Effects 

Four sets of student outcomes are examined for possible effects of 
student participation in RAISE. Absence rate for 1989-90 (the second 
year of the RAISE project) is calculated by dividing the number of 
days absent by the number of davs on the roll for each individual 
student. 1 Report card averages are calculated by averaging scores from 
the four quarterlv marking periods of the 1989-90 school year, to 
establish English grades and overall grade point average. On-grade 
promotion rates are calculated from whether an individual student 
had been promoted at the end of grade 6 and at the end of grade 7. 
Student test score performance is measured by grade-equivalent scores 
on the California Achievement Tests (CAT) administered by the district 
to all students not in special education at the end of the spring 1990 
term. 

Table 2 summarizes the results of the overall RAISE project, com- 
bining the seven sponsors in a comparison of RAISE and non-RAISE 
students attending the same core middle schools with statistical controls 
for student inputs. Several statistics are provided for assessing the siz.e 
and direction of possible RAISE effects. The unstandardized regression 
coefficient for an individual student's enrollment in RAISE, denoted 
bv B, is produced by the multiple regression analyses that also include 
the student input measures. This B is thus an estimate of the difference 
in a selected student outcome due to participation in RAISE in a core 
middle school, controlling for student input differences. Our estimated 
u effect size," ES. is similar to the statistic often used in experimental 
studies and meta-analvses, and is calculated as the difference between 
RAISE and non-RAISE students (given bv B) divided by the district 
standard deviation of the relevant outcome measure. The benchmark 
of .20 or greater for ES significance often used in educational research 
gives one standard for our assessments. We also provide a test of 
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statistical significance for B, together with the probability (a) that the 
observed difference is due to c hance. To help us judge how much 
RAISE effects reduce the actual risk level of students compared to a 
district standard, table 2 also presents the current mean of each outcome 
for RAISE students and for other students in the entire district who 
began grade six at the same time. 

Table 2 gives evidence of both the potential and limitations of RAISE 
as it has been implemented in the first two years. We observe statistically 
significant positive effects on two outcomes— absence rates and report 
card grades. And although the positive effects are meaningful, the 
average RAISE student still has remaining attendance and grade per- 
formance problems that pose major risk factors for continued success 
in school and for completing high school without dropping out. 

The effect on students' absence rate is shown in the first row of 
table 2. The reduction due to RAISE participation in annual absences 
of nearly 3 percent (B = -2.96) approaches the effect size that most 
analysts 'take seriously (ES = -.18) and attains a high level of statistical 
significance (a = .002). When translated into the number of extra 
days of attendance in a 180-day school year, the estimated effect is 
that RAISE students will attend about one more week than comparable 
non-RAISE students in the same middle schools (2.96 percent x 180 
= 5.3 additional days). While this increment in attendance is meaningful, 
table 2 shows it still leaves the average RAISE student with an annual 
absence rate of 18.38 percent, which is worse than the district average 
of 16.77 for students at this grade and remains a troublesome risk 
factor for success in later grades. Thus, our results indicate that RAISE 
taken as a whole has potential for improving student attendance, but 
the effects are not yet powerful enough to reduce the average absences 
to desirable rates. Since RAISE is not intended to upgrade the quality 
of schools where the students receive their classroom instruction, it 
remains to be seen whether RAISE alone can produce much greater 
gains in attendance and school performance. 

The second and third rows of table 2 present findings of RAISE 
impacts on report card grades. Students enrolled in RAISE are getting 
better grades than other students in the same schools after controlling 
for student inputs, but these grades still remain below the district 
average. The positive RAISE effect is statistically significant for English 
grades but not for math grades. The positive RAISE effect for overall 
grade point average is in between the values for English and math 
and just misses achieving a minimum significance level of .10 (see 
table 2). But in terms of the effect size statistic, the RAISE effects are 
not large bv conventional standards. Table 2 also shows that, even 
after the positive effects, the average RAISE student remains below 
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district mean report card marks and close to the minimum passing 
grade of 70. Thus, participation in RAISE is shown to help students 
get better grades in English and perhaps in their overall grade point 
average than other comparable students in the same middle schools, 
but the impact is not strong enough to move the average RAISE 
student to a high level of c lassroom performance. 

The final three rows of table 2 present results for student promotion 
rates and student achievement on standardized tests of reading and 
mathematics. We find no statistically significant differences between 
RAISE and non-RAISE students on any of these outcomes. However, 
the average RAISE student remains well below the district average on 
these outcomes and at levels that raise concerns about the chances of 
success in later grades, especially with regard to low promotion rates 
in middle grades, which are often precursors of dropping out at later 
grades. Table 2 shows that only about two-thirds of RAISE students 
have been promoted in each of the last years, compared to about 
three-quarters of students districtwide who began grade 6 at the same 
time (.662 vs. .759). So, even though RAISE has had some positive 
impacts on student attendance and grades, one of every three RAISE 
students wa3 retained in grade at least once since the program began 
two vears ago. Grade retention in elementary and middle grades has 
been shown to be a strong predictor of not completing high school 
(Shepard and Smith 1989), so many RAISE students remain in a high 
risk category for dropping out before high school graduation. 

Sponsor Effects 

A comparison of the seven different sponsors of RAISE programs is 
useful for further judgments on the potential and limitations of the 
first two vears of operation, since we can consider the range of impacts 
across sponsors and whether it is related to known variations in the 
implementation of RAISE components or in the students served by 
each sponsor. 

Table 3 summarizes the results for each individual RAISE sponsor 
on student absence rates and English grades, the two outcomes for 
which the overall positive impact of RAISE was most evident. Although 
the estimated sizes of effects are often much larger and more impressive 
in individual cases than the overall sizes reported in table 2, only three 
of seven sponsors report desirable effects that reach statistical significance 
on either of these two outcomes. 

Sponsor C shows the largest differences between RAISE and non- 
RAISE students on both absence rates and English grades. In this 
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case, the reduction in absence rates of 7.51 percent translates into 13 
more days of school attendance, or over two weeks of added schooling 
for RAISE participants. The improvement in English grades is over 
two points on average for the RAISE students served by sponsor C, 
which is statistically significant with a very low probability of error. 
Sponsor E also shows sizable and statistically significant effects due to 
RAISE on both outcomes, though not quite as impressive as the results 
for sponsor C. For sponsor F, the estimated effects of RAISE are both 
in the desired direction, but only the result on English grades reaches 
an acceptable level of statistical significance. Thus, the overall positive 
effects observed for the combined sample of all RAISE sponsors was 
primarily produced by three sponsors who produced especially powerful 
desirable impacts on their RAISE participants. 

RAISE Components and Effects 

We can find several possible explanations for why some RAISE sponsors 
show stronger positive results by examining additional information 
about each sponsor's own RAISE program. Table 1 provides information 
for each sponsor on the degree to which one-on-one mentoring has 
been instituted (as an indicator of program implementation), the grade 
reading level of the average RAISE student (as a measure of student 
input), the number of RAISE students being served in the core middle 
school (as an indicator of the size and dispersion of student participants), 
and notes to indicate which sponsors eliminated students nonrandomly 
from the program and other special circumstances. 

The use of one-on-one mentoring is not a consistent predictor of 
which RAISE sponsors produce the strongest positive results, although 
the set of three most effective RAISE sites includes two of the three 
sponsors with well-established mentoring components. Sponsors E 
and F are rated high on the implementation of mentoring and showed 
significant desirable impacts on their RAISE students. At the same 
time, the other effective sponsor (C) has the largest estimated RAISE 
impacts but did not use one-on-one mentoring at all. Clearly, other 
aspects of sponsor C's approach — such as the school-based advocate 
and the various activities between volunteer adults and RAISE stu- 
dents — were powerful enough to produce the observed benefits in 
student attendance and grades. Moreover, one sponsor (G) with a 
well-established mentoring component did not reflect significant average 
impacts in our analyses of differences between relevant RAISE and 
non-RAISE students. But sponsor G is a special situation among RAISE 
participants, in that students did not shift from an elementary school 
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setting to a middle school environment until grade 7, which might 
explain difficulties in this case of producing RAISE effects at the end 
of year 2. The other three sponsors that failed to show consistent 
significant effects on student outcomes in our analyses did not have 
strong mentoring components, pairing only a very small number of 
individual students with mentors (sponsors A and D) or having no 
one-on-one mentoring for any students (sponsor B). Thus, although 
we find one exceptional sponsor with strong effects and no mentors 
and one exceptional sponsor with well-established mentors and no 
consistent effects, the results for the remaining five sponsors support 
a conclusion that one-on-one mentoring is an important RAISE com- 
ponent: three with weak mentoring showed no effects and two with 
strong mentoring showed significant positive effects. It seems reasonable 
to conclude that, although strong one-on-one mentoring is not an 
essential component of an effective program that uses outside adults 
to assist at-risk middle school students, the RAISE model is much 
more likely to show positive effects when one-on-one mentoring has 
been strongly implemented. 

Other aspects of RAISE implementation also remain as possible 
explanations for differential sponsor effectiveness, including the reading 
scores with which students enter the programs of the different sponsors 
and the reduction of the numbers of students to be served by certain 
sponsors. The association of student input differences with sponsor 
effectiveness is only suggestive, if the grade 5 reading scores and core 
sizes uhown in table 3 are used as points of comparison. The two 
sponsors with the lowest average student input reading scores and 
with the largest number of students to be served in the core school 
(A and D) are also the two that show no consistent effects across any 
of the outcomes we examined and whose students remain most at risk 
in their further schooling. But, except for these two least effective 
sponsors, there is no other pattern between student inputs and sponsor 
effectiveness. While this evidence only suggests that RAISE effectiveness 
is more likely for smaller groups of students who do not begin far 
below grade level, other findings are also consistent with this conclusion. 

Table 3 also shows that the three most effective sponsors are the 
ones that had selected a nonrandom sample of actual RAISE participants 
from the original pool of eligible students. Although we cannot com- 
pletely rule out the possibility that these positive results were due to 
sample selectivitv rather than actual RAISE program impacts, additional 
analvses support a more general interpretation: RAISE effects are 
more likelv for sponsors who serve at-risk students with less severe 
initial educational disadvantages. When we repeated the analyses shown 
in tables 2 and 3 on the original samples that included all eligible 
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students as RAISE participants, we found similar results. Adding all 
eligible students to the analysis did not substantially alter the effects 
of RAISE, even though onlv some of them participated in the program 
of three sponsors. As might be expected, the results were slightly 
smaller in size in these reanalvses, but these effects remained statistically 
significant in the same patterns reported in tables 1 and 2. 

Some practical implications are suggested when we combine our 
conclusion that sponsors who reduced their original samples had the 
largest nonspurious RAISE impacts with our previous observation that 
the two sponsors who began with the largest and most disadvantaged 
student groups showed no effects. It appears that the size and com- 
position of student groups to be served require major differences in 
resources if programs such as RAISE are to be successful. Programs 
that begin with student groups that are very large or greatly behind 
academically will have a much greater struggle to demonstrate positive 
effects. 



Further Implications for Research and Practice 

This evaluation is one of the first that uses comparison groups and 
statistical tests to judge the effects of a well-financed program using 
adult advocates or mentors. It provides some additional new perspectives 
on major practical and research questions. 

Both the potential and limitations of programs such as RAISE emerge 
from the evaluation results after two years of operations. We find it 
possible to help students make impr^sive gains in school attendance 
and in report card marks, but the average gains after two years are 
not sufficient to eliminate the academic risks with which students entered 
the program. Even after the RAISE benefits, the average student 
continued to have serious problems of absenteeism and low grades 
compared with the typical student in the district. And RAISE has not 
yet had measurable positive impacts on student standardized test scores 
or promotion rates in the middle grades. Nevertheless, the student 
behaviors where RAISE has been successful in its first two years can 
be viewed as steps in a sequence to improve students' academic chances 
as the program continues. 

School attendance, on which RAISE demonstrated a positive impact, 
is a behavior that is most open to short-term improvements and that 
can lead to advances in other school outcomes. Good attendance may 
be more completely under the control of individual students and more 
susceptible to positive influences by adult advocates or mentors than 
other school behaviors. In contrast to report card grades and promotion 
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rates, for which teachers make the major decisions, every student can 
have good attendance. To be sure, some teachers can induce better 
student attendance through more engaging lessons and more positive 
relations with students. But any student with absenteeism problems 
can improve daily attendance with extra effort and effective support 
from family and meaningful adults. Effective encouragement by mentors 
or advocates to improve student attendance may require less training 
and program management than other support activities such as academic 
tutoring or negotiating with teachers or school officials on behalf of 
students. Thus, attendance rates seem to be a student outcome on 
which adult mentor and advocacy programs can focus to be effective 
in the short run. 

Good school attendance can often be a building block to other student 
behaviors required for school success. Students' attendance rates are 
often closely tied to their school report card grades, because many 
teachers will mark down students who have higher absenteeism levels 
than the rest of the class and some teachers will automatically fail 
students who have missed a significant portion of the term. Course 
failures due to poor attendance can lead to higher retention rates. 
Student learning as measured by standardized test scores can also be 
expected to suffer as a result of poor attendance, because absent students 
will miss instruction and engage in less drill and practice in the basic 
skills covered by tests. So a program of assistance by outside adults 
that focuses on improving student attendance may have a cumulative 
effect over time on other academic outcomes. 

Our findings that RAISE affects report card grades may be partially 
explained by the improvements in attendance that may lead to better 
grades. We found strong RAISE effects only for English grades, so it 
is likely that direct support for academic learning by RAISE adult 
advocates and volunteers was also responsible, through activities such 
as assistance with completing homework, tutoring in basic skill areas, 
or assistance in learning activities such as reading practice. More effective 
assistance with academic learning may require more specialized training 
of RAISE adults or more coordination with the ongoing school in- 
structional program, but our evaluation of RAISE suggests that direct 
academic activities can be a successful early part of programs using 
outside adult advocates or mentors. 

Major issues also emerge from our evaluation of RAISE about the 
successful implementation and coordination of program components. 
Programs that seek' to make one-on-one mentoring a key component 
face several issues. Recruiting mentors and having them sustain suc- 
cessful relationships with at-risk students has been a major challenge 
for most RAISE sponsors. Some sponsoring organizations (such as 
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the two participating c hurches) were able from the beginning to locate 
sufficient numbers of adults who committed to the mentor role and 
responsibilities. But most sponsors have taken more time to establish 
mentoring activities and are still working to recruit an individual mentor 
for every RAISE student. In some cases, sponsors have followed a 
process in which sets of adult volunteers engage in group activities 
with groups of RAISE students to assist in identifying one-on-one 
pairs that seem to be good matches for a sustained positive relationship. 
But when the requirements for mentors are major and regular com- 
mitments of time and energy, it will often take a long time before a 
sponsoring organization can build a cadre of committed mentors. During 
this time, other useful activities using adult volunteers in less demanding 
roles can be established by sponsors as part of their process of establishing 
and supplementing an effective mentoring component. 

The mentors role is usually described as being a caring adult to 
support a student's efforts to succeed at major goals, but this conception 
raises questions of implementation and coordination with paid adult 
advocates in the program and with other adults in the student's school 
and home. Mentors are usually not intended to assume the supervisory 
and disciplinary roles of parents and teachers, but to provide a positive 
uncriticizing reference in the student's life. Yet, some of the student 
behaviors that appear most responsive to influence by outside adults, 
such as improved school attendance, may require adult monitoring 
and pressure that goes beyond the theoretical role of mentors or 
bevond the understandable preferences of some adults who actually 
fill these roles to avoid possible confrontations with their students. 
This raises the question of whether others such as the paid adult 
advocate might better handle the supervisory and disciplinary activities 
while mentors continued to focus on positive supports and incentives 
in their relationships with individual students, or whether both the 
definition and training of mentors should address the needs for both 
adult support and constructive criticism. 

Further evaluations of RAISE or other programs with similar com- 
ponents and goals are needed to learn whether effects become stronger 
as a program continues beyond its formative years and whether certain 
students respond better and benefit more from adult advocates or 
mentors as they move through the middle and secondary grades. 
RAISE managers are using our evaluation of the project's first two 
years to intensify and focus their efforts for the future. They expect 
one-on-one mentoring to gradually become available for most student 
participants in all but one sponsoring organization, and they intend 
to concentrate more on improving student school attendance and on 
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working out arrangements with school officials to minimize retentions. 
Closer attention will be paid by both RAISE practitioners and evaluators 
to which students participate most in each type of RAISE activities to 
learn about individual differences in student responsiveness to RAISE 
offerings and to identify the RAISE components and activities that 
have the most impact on particular student outcomes. 



Notes 



This research was supported bv The Abell Foundation. Baltimore, Marvland. 
The authors gratefullv acknowledge the support and assistance ot Kalman 
Hettleman, Director of the Baltimore Mentoring Institute. Richard Rowe 
Director of Project RAISE I, and A. C. Hubbard. President of the Board ot 
Trustees of RAISE. Inc. Opinions expressed herein are our own, and no 
endorsement from the sponsoring and supporting organizations should be 
inferred. 

1. To avoid problems sometimes associated with extreme values on ratio 
scales we eliminated students who either (a) missed more than 135 days of 
the 180-dav school vear or (b) were on the roll for less than 45 davs. This 
truncation also obviates problems due to schools that may retain names on 
the roll that likelv have left the district, in order to gam an enrollment advantage 
for staffing calculations. We repeated all analvses using untruncated absence 
rates and observed no substantial changes in reported results. 
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Lessons Erom the Field: Case Studies 
of Evolving Schoolwide Projects 



Linda F. Winfield 

Center for Research on Effective Schooling for Disadvantaged Students 
The Johns Hopkins University 

This study describes changes that occurred in one of the nation s largest urban school systems on 
the East Coast following passage of the Hawkins-Stafford Amendments. Case study methods 
were used to describe the central office and system role and changes at the elementary school level 
in selected sites. A major emphasis of central office framework for schoolwide projects (SWPs) 
was school-based management and instructional frameworks based on effective schools re 
search. The primary type of instructional intervention at the school level was reduction of class 
size during reading and math instruction. Schoolwide projects offer the potential for improving 
learning outcomes of disadvantaged students but require coordinated and direct support from 
the central office and district. 



ERLC 



The promise and potential of ESEA (Ele- 
mentary and Secondary Education Act) 
Chapter I for improving the school achieve- 
ment of economically disadvantaged stu- 
dents was part of the underlying rationale for 
its creation. This focus also contributes to its 
continued success among policymakers and 
practitioners as a categorical funding pro- 
gram. Both of these groups recognize the 
need to devote additional resources to assist 
schools serving student populations where 
high levels of poverty have a negative impact 
on schooling conditions and learning. Eval- 
uations of Chapter I (formerly Title I) pro- 
grams have been mixed but have generally 
failed to find substantial long-term achieve- 
ment effects for students receiving services 
(Carter, 1984). The variability of program 
effects, while due in part to methodological 
differences, is also due to the variation in the 
actual educational program implemented. 
Chapter I is a funding program that provides 
supplemental services to the regular school 
program. The typical mode of delivery of 
instructional services has been the "pullout." 
Previous research has documented the dis- 



ruptive impact of pullouts, the waste of mate- 
rials and time in trying to keep noneligible 
children from benefiting from Chapter I ser- 
vices, and the limitations on use of effective 
programs imposed by the principle that only 
test-eligible children may be served (Al- 
lington & Johnston, 1989; Glass & Smith, 
1977; Leinhardt, Bickel, & Palley, 1982; 
Winfield, 1986a). Moreover, in many of these 
schools belief systems develop among 
teachers and administrators in which they ab- 
dicate the responsibility for improving the 
learning of students receiving Chapter I ser- 
vices (Winfield, 1986b). 

Nearly over a decade ago, case studies of 
schools in low-income urban and rural areas 
revealed that schools which "succeeded be- 
yond expectations in teaching reading" (Ven- 
ezky & Winfield, 1979) were those in which 
the principal and staff had made a conscious 
decision to improve the achievement of all 
students and had targeted high school-wide 
achievement as a goal. Principals in these 
schools included Chapter 1 (then Title 1) as 
part of the overall strategy. Although compli- 
ance with federal regulations provided cer- 
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tain restrictions, these schools operated a 
"student-centered" delivery of instructional 
services. This meant that all available build- 
ing-level instructional resources (Chapter I, 
district, or otherwise) were coordinated and 
targeted to support and reinforce student 
learning in the core instructional program 
rather than the more typical case where each 
supplementary program had its own meth- 
ods, materials, philosophies, and approaches. 
This latter scenario resulted in a fragmented 
instructional program for low-achieving stu- 
dents. 

The Hawkins-Stafford Amendments 
(1988) which allow the use of Chapter I fund- 
ing for schoolwide projects (SWPs) in 
schools where 75% or more of the students 
are economically disadvantaged are designed 
to reduce the fragmentation and to upgrade 
the entire school program. The flexibility in 
federal regulations comes at a time when the 
knowledge base has been advanced concern- 
ing effective schools (Purkey & Smith, 1983), 
the change process (Fullan, 1982), and suc- 
cessful programs in urban schools (Slavin, 
Madden, & Karweit, 1989). A major task 
confronting urban school systems and 
schools is how to make use of this new knowl- 
edge and also take advantage of the in- 
creased flexibility to improve the learning 
outcomes of low-achieving students. These 
opportunities come at a time when poverty 
has increased dramatically in major urban 
school districts (Wacquant & Wilson, 1989) 
and when contextual factors, such as size, 
demographics, diversity, density, a growing 
"underclass," the underground economy of 
drugs, the politics of school boards, and an 
eroding tax rate create uncertainty and tur- 
bulence in the school environment (Englert, 
1989). 

Urban School Systems 

Some researchers suggest that in response 
to uncertainty, schools and districts develop 
large, complex, bureaucracies (Bidwell, 
1965) that are characterized by rigidity and a 
variety of dysfunctions (Levine, 1978). In 
large, urban districts, the response is charac- 
terized by a tendency toward disengagement 
from instruction (Meyer & Rowan, 1978). 
District and central office administrators are 



generally removed from what goes on in 
classrooms. In general, the governance and 
control of many schools serving disadvan- 
taged students are often fragmented by com- 
peting groups (e.g., unions, school boards, 
state departments, and special interest 
groups) and programs (e.g., Chapter I, mi- 
grant, special education, bilingual, curricu- 
lum and instruction, budget, personnel, and 
school operations). There is a high degree of 
role differentiation and specialization at the 
central level, and individuals and groups be- 
come territorial regarding their expertise, 
budget, and their constituencies. Services 
provided to the schools are seldom coordi- 
nated, and school building principals must 
deal with four or five central staff persons for 
a simple request. Central office administra- 
tors and policies at this level influence coor- 
dination of efforts and instructional collab- 
oration at the school level (Birman, 1981; 
Kimbrough & Hill, 1981). Thus, the purpose 
and intent of schoolwide projects which focus 
on upgrading the whole school program may 
be difficult for schools to attain, given the 
competing demands for central office 
groups. In general, when school districts re- 
organize central administration, the realized 
improvements are marginal and confounded 
with other simultaneous changes (March, 
1978). The tinkering with duties and position 
titles rarely has an impact on the instructional 
process in classrooms. The difference be- 
tween those districts in which schools suc- 
cessfully change and the typical school is the 
concept of "connectedness" (Wimpleberg, 
1989). Wimpleberg (1989) notes that it is 
unlikely that schools will act on their own to 
improve or that school systems will have re- 
sources to employ the needed specialists to 
assist in the process of change. That type of 
technical assistance must be provided from 
personnel rather than through paperwork 
(Eubanks & Levine, 1987). Other studies, 
however, have identified examples of strong 
leadership by large urban school superinten- 
dents who shape district-wide conditions for 
improving schools within a context of broad 
community support (Hill, Wise, & Shapiro, 
1989). 

The purpose of the case study was to de- 
scribe changes in a major urban school sys- 
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tern and schoolwide project schools following 
the passage of the Hawkins-Stafford Amend- 
ments. Between July 1989 and July 1990, 
central and district office SWP meetings and 
staff development sessions were attended, 
and one- to two-day site visits were con- 
ducted in 11 schoolwide project schools. 
School system documents, school SWP pro- 
posals, and other reports were examined. A 
more thorough description of this study can 
be found in Winfield and Stringfield (1990). 

School System Context 

The Chapter I program in a major urban 
school system serves 162 schools and receives 
approximately $50 million in Chapter I funds 
annually. Since 1983, various initiatives tar- 
geted toward improving the achievement of 
Chapter I schools had been initiated by the 
superintendent, who can be described as a 
demanding instructional leader. She man- 
ages a $1 billion budget yet places the educa- 
tion of the 250,000 children ahead of every- 
thing else . One of the past initiatives targeted 
the improvement of 26 Chapter I schools over 
a three-year period beginr ingin 1983. Funds 
from a private foundation and Chapter I 
funds were used to support a school-based 
planning and implementation process. As 
the third year of the project began, the cen- 
tral office felt that in some schools additional 
human resources were needed to change the 
historical patterns of low student achieve- 
ment which existed. Thus, for example, 
teachers were hired to staff full-day kinder- 
gartens, and a permanent substitute was as- 
signed to the schools. For the 1986-1987 
year, the superintendent opted to designate 
11 of these schools previously targeted as 
schoolwide projects and to pay the matching 
share then required for noneligible students 
who were receiving services. When the 
Chapter I guidelines were changed in 1988, 
the school system expanded the number in 
the program rapidly. Currently, approx- 
imately half of all of the elementary schools 
in this system are SWPs. 

At the same time these initiatives were 
under way, a system-wide Chapter I Task 
Force that had been meeting since 1987 made 
a recommendation based on student out- 
come data to expand SWPs. The task force 
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consisted of all of the major special interest 
groups and stakeholders (e.g., central office 
staff from budget, special education, curricu- 
lum, and compensatory programs, as well as 
district superintendents, teachers, and prin- 
cipals in Chapter I schools). The major task 
was to develop a comprehensive compensa- 
tory program, designed to improve student 
achievement which would be phased in over a 
two-year period. One former task force 
member interviewed said: "It was a working 
group ... it brought everybody to the ta- 
ble. .. . we didn't always agree but we knew 
that something had to be done to improve 
. . . that's the bottom line." 

The school systems approach to SWP iden- 
tifies five main thrusts: (a) a whole-school 
approach which supports student success in 
the daily program, provides special support 
for students who require it, and is based on 
the "effective schools" research; (b) school- 
based management which requires that the 
school staff ana parents determine the nature 
of the intervention within specified program 
guidelines and contractual requirements. 
Chapter I funds ar<; provided to each school 
as a block grant (total averaging about 
$250,000-$300,000 or $900/pupil). (c) Mon- 
itoring individual student, class, and school 
performance on an ongoing basis and giving 
particular attention to those students tar- 
geted for intensive services and those who 
would be designated as Chapter I eligible 
should they attend a nonschoolwide project; 
(d) district-based support provided by the 
central and subdistrict offices to provide par- 
ent and staff training on an "as requested" 
basis. This support targeted leadership devel- 
opment and team building, ongoing leader- 
ship team meetings for principals and key 
staff, and monitoring school improvement 
plans, (e) Concentration of resources which 
indicates that funds beyond the minimum 
amounts would be committed from Chapter 
1 and operating budgets. 

As part of its commitment to SWPs, the 
central office designated an office of school- 
wide projects with a director and manager to 
develop and oversee the implementation. 
Mr. D., the selected manager, had had no 
prior central office experience but had been a 
highly successful elementary principal for 
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some years and was credited with "turning 
around" the dismally low performance in an 
extremely impoverished urban school. Mr. 
D. indicated: 

From my own experience to change what's 
happening in schools, staff development for 
principals is critical because most of them 
don't know what to do . . . teachers have to 
be supported because many of them are 
scared to change, and direct services to the 
school have to be expanded but also coordi- 
nated. 

Throughout his tenure, he developed the op- 
erational guidelines for implementing SWP 
but was also the role model for principals and 
the chief advocate for students, teachers, and 
principals in schoolwide projects. On many 
occasions, he indicated that schools needed 
to undergo an "awareness and orientation 
phase" that primes them for changes in how 
they traditionally deliver services to Chapter 
I eligible students and how to participate in 
decision making on a schoolwide level. 

System-Level Support for School Change 

According to Mr. D. and other principals 
in SWP sites interviewed, school-based plan- 
ning and site-based management are proc- 
esses which are not easy to carry out effec- 
tively. Thus, principals and teachers require 
continual coaching, encouraging, admonish- 
ing, recognition, and incentives in order to 
get them to "buy into" the process and to 
implement a schoolwide intervention. Princi- 
pals and teachers have traditionally selected 
instructional materials and made decisions 
about a particular program or focus; how- 
ever, few in SWP sites had been involved with 
making decisions which affected the whole 
school or with reaching a consensus concern- 
ing decisions, such as hiring an additional 
math resource teacher or eliminating the 
reading laboratory. 

There were newly created positions of spe- 
cialized SWP personnel to assist schools with 
these decisions in their leadership team meet- 
ings. These persons were knowledgeable about 
change and about the instructional process, 
and school-based management provided the 
services of internal change agents, A position 
titled "Instructional Interventionist" served as 
a liaison between the subdistrict and the SWP 



site. The interventionists were action-oriented 
and participated in principal-led monthly lead* 
ership team meetings in each SWP school, and 
they organized ongoing staff development and 
cross-school sharing for principals and staff. In 
addition, they coordinated, directed, and pro- 
vided staff development for instructional sup- 
port teachers; provided assistance to the prin- 
cipal in arriving at a workable school im- 
provement plan; and ensured that all materials 
and supplies purchased were related to the 
school's detailed instructional improvement 
plans. 

The instructional support teachers (ISTs) 
held teacher-level positions in each sub- 
district; each 1ST was responsible for over- 
seeing two SWPs and worked directly with 
the principal and school personnel. They 
were in each school two to three days a week, 
depending on needs at the individual SWP 
site. They served as a "trouble shooter" and 
an implementation coach for the principal. In 
addition, they provided staff development 
and worked with their school-based counter- 
parts, the program support teacher (PST). 
These teacher-level positions were based at 
the school and were selected from the school 
staff by the principal. The PSTs were consid- 
ered by their peers as a "master" or "men- 
tor" teacher. They provided instruction to 
students for 90 minutes a day and spent the 
remainder of the time working directly with 
the principal , new teachers, and other staff in 
implementing the schoolwide plan. They 
monitored student progress, participated in 
leadership team meetings, and did demon- 
stration lessons. Interviews with these staff 
indicated that their positions were "labor 
intensive, demanding and required long 
hours," yet they were highly sought after as 
evidenced by the number of teachers apply- 
ing and taking the oral and written test. In 
interviews regular classroom teachers indi- 
cated that the positions were "higher status" 
and meant to them "master teacher" even 
though PSTs and JSTs received the same sal- 
ary as a regular teacher. The system of spe- 
cialized positions was dynamic — one in which 
new talent was constantly sought as veteran 
program support teachers were promoted 
into 1ST positions, and ISTs moved up to 
instructional interventionists. Several of the 
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instructional interventionists became build- 
ing principals. 

Instructional Frameworks 

Another major objective of implementing 
SWPs was to change from a traditional "pull- 
out" model to a whole-school instructional 
focus. The central office developed and pro- 
vided staff development in four instructional 
frameworks from which schools could choose 
in order. The frameworks were general and 
included factors such as high expectations, 
monitoring, positive school climate, and 
team work. They also included, however, 
classroom-based strategies such as coopera- 
tive learning, active teaching and learning, 
and effective lessons. Attendance at staff 
development was highly encouraged, and 
teachers were paid; however, it was volun- 
tary. Thus, the use of the diverse frameworks 
varied across school sites. In some sites, the 
framework was not a salient part of the ob- 
served school program. In other sites, the 
framework provided a common language for 
staff to use in discussing students and instruc- 
tional matters, or it served to facilitate team 
building. In these settings, the frameworks 
helped to create a sense of community 
among staff, allowing them to coalesce 
around common goals. This was particularly 
evident during staff development sessions 
when principals and teachers from all SWP 
sites were divided into groups on the basis of 
the framework adopted. The particular in- 
structional framework itself was not as im- 
portant as allowing principals and teachers to 
select and adapt one which they felt was most 
appropriate to their school. 

Parent Involvement 

Another concern in SWP sites was to in- 
volve parents in the educational process of 
their children. Each school's SWP proposal 
was required to delineate ways in which the 
site would conduct parent involvement activ- 
ities. Schools were also required to include in 
their budgets funding for a school community 
coordinator. This individual initiated strate- 
gies to improve attendance. At several sites, 
he or she was responsible for implementing a 
daily system of identifying all absent students 
in order to make immediate contact with the 
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home. He or she also coordinated and di- 
rected parent workshops over the . school 
year. "Parent scholars'* — parents from the 
community who assisted in the classrooms — 
were also funded out of SWP budgets. These 
assistants were provided with a modest sti- 
pend and worked in 10-week cycles. Parents 
were observed assisting in classrooms, the 
library, computer lab, and lunchroom. Each 
SWP site was provided with a parent trainer, 
who visited the site regularly to assist in re- 
cruiting and training community assistants 
and to assist in other parent involvement ac- 
tivities. Each site also had a trained home 
demonstrator, whose sole purpose was to 
make home visits and to work directly with 
parents on learning readiness, on helping 
their child with homework, and on other 
school-related activities. These personnel 
provided systematic support and increased 
the number of parents involved in SWP 
school activities. 

What Are the Major l>pes 
of Interventions? 

The 11 SWP sites in the sample used their 
Chapter 1 funds in a variety of ways. In the first 
years, some of the funds were used to purchase 
needed materials such as science kits, math 
manipulatives, and classroom literature li- 
braries. One site used funds to extend the 
school year by 22 days. Nearly all of the schools 
established an additional teaching position to 
lower the teacher-student ratio during math 
and reading instruction. Approximately half 
reduced class size in classes with the lowest 
achieving students. In over half of the SWP 
sites, the additional teaching position elimi- 
nated split grade classes. In some schools, the 
program support teacher, SWP reading and/or 
math resource specialist, and basic skills 
teacher provided the entire lesson to the whole 
class on a scheduled basis. Other schools devel- 
oped team teaching models in which SWP per- 
sonnel taught in the classroom with the regular 
teacher. 

What Does a SWP Look Like? 

School A 

Constructed in 1937, School A is a small 
school building in the middle of a neighbor- 
hood that is rapidly undergoing regentrifica- 
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tion, A few blocks to the east are new town 
homes and renovated row houses selling for 
$250,000 and up. A few blocks to the west are 
the remnants of three high-rise-project build- 
ings that are slated for demolition. Because 
of population shifts, the school enrollment 
has declined from 700 to 397. Ninety percent 
of the students in the school are eligible for 
free lunch. Although the school population is 
predominantly Black (81%), there are also 
Asian, (3%), Hispanic, (4%) and White 
(12%) students attending the school. The 
school enrolls children from kindergarten to 
eighth grade and has 13 regular grade classes 
and 10 special education classes. Moderately 
and severely handicapped children are bused 
from outside the immediate school neighbor- 
hood. The staff have devised activities to in- 
tegrate many of these students into as much 
of the school day as possible. 

Although it is not officially a magnet 
school, it enjoys a "good reputation/ 1 and 
according to the current principal, parents 
are "clamoring" to get their children en- 
rolled. This school finished among the top 10 
in science and mathematics in the district and 
has had a full-time science room with a sci- 
ence teacher for only the last four years. As 
one enters the school, it becomes imme- 
diately apparent from the school banner, dis- 
plays of students' work, trophies, and awards 
that someone has fostered and maintained a 
sense of school pride and spirit. In the school 
office, a two-page handout for substitute 
teachers is noticeably displayed and provides 
essential information on lesson plans, roll 
book, homework, lines, classroom manage- 
ment, school procedures, and academic 
notes on each subject area. The first line 
reads: "We are a school-wide project 
school." 

Mr. A., the principal, is completing his 
first year at School A, but he has been a 
principal in urban schools for the past 22 
years. He credits the success of the SWP to 
the former principal, who had created the 
basic idea. He felt that the plan was "teacher 
intensive," indicating that the funds were 
used primarily for teachers to reduce class 
size in reading, and he said, "We're not lack- 
ing for materials, but most of the money was 
spent on personnel. In the lower grade, two 
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teachers assist in the primary grade reading 
cycle. One works with the upper grade read- 
ing/language arts." He attributes the success- 
ful implementation of the plan to the ongoing 
staff development and an active pupil sup- 
port committee, which meets twice a month 
to discuss alternative interventions for indi- 
vidual students having academic problems. 
He said, "Having paid staff development and 
meeting time for pupil support committee 
meetings has been a great advantage. " He 
described the staff as "stable and very strong- 
willed but capable and very caring" of the 
youngsters that they taught. 

Teachers interviewed indicated that since 
becoming a SWP, the biggest change was th at 
all of the faculty provided input into the plan . 
Other teachers indicated the increased flex- 
ibility as an advantage. One teacher said, "I 
never liked the pullout model .... the coor- 
dination makes sense . . . they're not freight 
packages — they're children." Another teacher 
noted: "Before, the classroom assistants 
could only teach certain students; now they 
can deal with all of the kids." The teacher 
who also chairs the "Climate" committee for 
the school credited the paid meeting time for 
the pupil support committee, which he indi- 
cates "allows us to be more systematic in 
finding out and doing something with kids 
who are having problems. . . . These 
teachers here really care . . . this place is 
like one big family, and we support each 
other." 

According to the program support teacher, 
Mrs. Bee, who had been at the school for 22 
years, the "thinking skills" was an area that 
staff decided to work on even prior to becom- 
ing a SWP. Their decision to adopt the frame- 
work on thinking skills, she indicates, was an 
easy one. Mrs. Bee explained that although 
she is "teacher of record" for a group of 12 
kindergarteners that are reading, and 15 of 
the lowest achieving first graders, the major- 
ity of her time is spent in various classrooms 
conducting demonstration lessons or co- 
teaching with teachers. She explains that 
"teacher of record" is a SWP concept mean- 
ing that the person who provides the instruc- 
tion in reading, for example, is also responsi- 
ble for monitoring and improving student 
progress and grading. Her other responsibili- 
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ties include assisting classroom teachers. 
She says: 

If a teacher is absent, I'll go in during the 
reading period so that the reading instruc- 
tion is not disrupted. We have a new teacher 
in the school, and I was in her class conduct- 
ing the reading lesson for the first month or 
so . . . since also, along with Mrs. G, the 
other reading teacher, I give an informal 
reading assessment to all students 3 times a 
year so that some don't fall through the 
cracks. 

She viewed an important aspect of her posi- 
tion as monitoring and said that SWT re- 
quires a lot of paper work; however, she indi- 
cated, "teachers have all the information on 
individual students in one place, the SWP 
record book — grades, end-of-unit tests, city- 
wide test, teacher-made tests, homework as- 
signments. I collect these every 6 weeks from 
each teacher and review them." 

The school's focus on integration of ".pedal 
education students, as well as commitment to 
teach all students, was observed during the 
reading cycle. An excerpt from the observa- 
tion follows: 

Mrs. Bee began her routine that apparently 
all of the children know. It is a song with 
hand motions that the children do that cap- 
tures their attention and tells them to put 
their thinking caps on. She introduces the 
lesson by saying, 'Today we are going to 
talk about beginning sounds," and draws a 
picture of a hat on the board and writes 
— at. Now what belong in this space? The 
children respond "H" and she writes it in. 
She continues to introduce word families 
and sounds that will later be used in a "big 
book" story she reads. She is animated, 
moves around the room, calling on the 
whole group and individual children, in- 
cluding the special ed students, to respond 
to provide the beginning sounds for the pic- 
tures and word families on the board. 

One would not have known that the students 
in this classroom were classified as "trainable 
mentally retarded" except for the size and 
age of the youngsters seated in the back and 
one little girl who imitated the behavior of 
ihe other children but couldn't understand. 
Still she raised her hand to respond to ques- 



tions and tried to write the letters which upon 
observation were unintelligible scribbles. 
Mrs. Bee praised her for trying and later told 
me that "she really tries, but her problems 
are too severe." 

School B 

School B, built in the 1970s, is a factory- 
like structure that takes up nearly a city block 
and overshadows the small rowhouses in its 
immediate neighborhood. The school is sur- 
rounded on one side by a large outdoor play 
area, and in the front by a mixture of both 
well-kept and decrepit boarded-up row 
houses. An influx of young families with 
small children in the area has caused an in- 
crease in the school's enrollment. At the time 
of the visit, the school was past the capacity 
(900) of the building. Seventy-one percent of 
students are Hispanic, 21% White, and 7% 
Black. Every available space at the school is 
filled. One kindergarten class and one class 
of third graders use rented space in an adja- 
cent church building. The science, art, and 
music rooms have been converted to class- 
rooms. Specialist teachers go from classroom 
to classroom. The small conference room 
also functions as a lunch room for teachers. 
One is immediately overwhelmed by the 
sheer size of the building with its huge hall- 
ways and extremely high ceilings with 
exposed pipes. Despite the massive number 
of students, entrance, dismissal, and a fire 
drill were experienced without chaos and 
were quite orderly. Prominently displayed in 
the lobby is the "Creating Success" logo, the 
instructional framework chosen by the 
school. 

According to the school community coor- 
dinator, a young bilingual Hispanic female 
who has lived in the neighborhood for 15-20 
years, the big change in the neighborhood 
occurred about 5-6 years ago. "Many stable 
families who could afford to move left the 
area ... we have a lot of young families with 
many children — some who have just come 
over from Puerto Rico ... we also lose kids 
whose families return." In addition to her 
home visits requested by teachers and parent 
workshops, a substantial amount of time is 
spent referring parents to community agen- 
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cies, taking parents for appointments, trans- 
lating for parents who don't speak English, 
providing clothes and emergency shelters, 
and interpreting report card marks. She had 
served as home and school president prior to 
becoming a school community coordinator. 

Mr C. , the principal, is a high-energy, fast- 
paced, organized, and task-oriented individ- 
ual. He stands in the hall and greets each 
child by name , handing out small rewards for 
good behavior, for perfect attendance, and 
for reading. He indicated that the school as a 
whole is challenged to read one million pages 
between September and June. Since becom- 
ing a SWP, he felt that because he was al- 
lowed more flexibility in how he could use his 
staff he stressed co-teaching models. In the 
school, SWP teachers team taught with grade 
teachers to reduce the student-teacher ratio 
during language arts periods. The math re- 
source teacher and assistant teamed with 
three grade teachers to reduce the student- 
teacher ratio for math instruction for one 
hour each day. The ESOL teacher worked in 
the classroom with ESOL students. Funds 
also paid for planning time for the ESOL 
teacher to plan with grade teachers. Mr. C.'s 
background is in reading/language arts, and 
he teaches during one class period each day. 

He states: "The only way children learn to 
read is through reading . . . through fre- 
quent, positive interactions with a variety of 
meaningful texts. They learn to construct 
meaning. . . . The developmental process 
is supported by systematic, explicit instruc- 
tion in phonics/word attack skills. We in- 
creased the amount of time from 1 hour to 
90 minutes, emphasizing literature-based 
instruction and thematic unit planning. Be- 
cause of SWP, we reduce class size and stu- 
dent-teacher ratio during reading/language 
arts instruction. We also have on-site staff 
development in implementing [a] literature- 
based reading program provided by a local 
university. 

Teachers receive 3 graduate level credits. 
He indicates that this has been very success- 
ful in helping teachers to learn how to imple- 
ment literature-based instruction. 

Other teachers interviewed noted other 
benefits of being a SWP. One teacher indi- 
cated: "We are a big school, and we have a lot 



of funds poured into us, but we were told this 
is how you have to spend it regardless of 
whether students needed it or not." Another 
said: "Now we're able to get more person- 
nel. .. . We used to have classes ability 
grouped, and some classes were not Chapter 
1 eligible but were still in need of additional 
help. Before, I couldn't serve them." Other 
teachers noted that paid meeting times, on- 
site training, and availability of funds to pur- 
chase sets of literature books and expand 
classroom libraries were important benefits 
of being a SWP. 

Conclusions 

From the brief descriptions provided here, 
one can note that SWP schools in the process 
of change look much like other schools that 
are making conscious attempts to improve 
classroom instruction and improve upon 
existing programs. Because of the high con- 
centration of poverty, many of the schools 
are plagued by staff vacancies, operating 
budget cuts due to declining enrollments, 
and high student mobility. However, wkh the 
assistance provided by SWP personnel, the 
schools are grappling with issues, such as how 
to make schoolwide decisions, how to create 
effective working plans for improvement, 
how to integrate other existing categorical 
programs into a coherent instructional pro- 
gram, how to allocate the available resources 
effectively, how to provide on-going support 
to classroom teachers, and how to deliver 
higher quality instruction to disadvantaged 
students. 

In order for schools to change from a tradi- 
tional Chapter I program to a more inte- 
grated focus on all students, parallel ch mges 
must be made at the central office. Not only 
must these offices become more "con- 
nected" to the instructional piocess, but they 
must also be organized in such a way as to 
provide effective coordination and delivery 
of direct service to schools becoming SWPs. 
School systems will also have to invest 
heavily in human resources and professional 
development at all levels. High-poverty 
schools, such as SWP sites, have tremendous 
needs for direct, on-site, systematic assist- 
ance in changing existing structures and neg- 
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ative belief systems; more intensive profes- 
sional development in collaborative teaching 
models and subject-matter instruction; more 
proven high-quality educational interven- 
tions for students experiencing academic dif- 
ficulties; and strong reciprocal agreements 
with teacher training programs to aid in re- 
cruitment and development. Schoolwide 
projects have the potential for improving the 
learning outcomes of large numbers of disad- 
vantaged students. However, this potential 
will be met only if adequate support for 
change is provided at the central or district 
level and if sufficient resources are devoted to 
human resources and professional develop- 
ment. 

Note 

A version of this paper was presented at the 
National Conference on Educating Black Chil- 
dren, June 2, 1990, Los Angeles, California. 
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Policy Issues: Chapter 1 
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Modifying Chapter 1 Program Improvement Guidelines 
to Reward Appropriate Practices 



Robert E. Slavin and Nancy A. Madden 

Center for Research on Effective Schooling for Disadvantaged Students 
The Johns Hopkins University 

New accountability guidelines have helped to focus educators on the outcomes of Chapter 1 
programs, but they may also be rewarding counterproductive practices. They may discourage 
early interventions, such as preschool, kindergarten, and first-grade programs, which increase 
the baseline for later gains. They may reward retentions, which significantly increase apparent 
normal curve equivalent (NCE) gains. They may focus teaching on narrow, easily measured 
objectives. This article proposes an alternative approach to Chapter 1 accountability which 
rewards schools for reducing the number of students who fail to meet minimum standards on 
broad-based, appropriate tests. Retained or untested students would be counted as not meeting 
minimum standards. Program improvement services would be greatly increased and made 
available to all Chapter 1 schools. Advantages and problems of this system are discussed. 



ERLC 



While Chapter 1 and its predecessor, Title I , 
have always been service-delivery programs, 
they have also been accountability programs, 
requiring districts to evaluate and report the 
progress of Chapter 1 students on achieve- 
ment tests. These accountability require- 
ments have had a major impact on account- 
ability procedures used by school districts for 
all students. The 1988 Hawkins-Stafford bill 
introduced new methods for evaluation of 
Chapter 1 programs and new roles for state 
and local education agencies tied to these 
evaluations. The changes are subsumed un- 
der the term program improvement. The in- 
tention of program improvement is to iden- 
tify schools in which Chapter 1 students are 
not making adequate progress toward grade- 
level performance and to require these 
schools to reformulate their plans. 

In concept, the idea of program improve- 
ment is a major step forward. For the first 
time, Chapter 1 is putting a major emphasis 
on the nature and quality of programs pro- 
vided to children and the outcomes of these 
programs. The program improvement guide- 



lines are surely identifying some schools 
which are, in fact, doing a poor job with low- 
achieving children and are giving them both 
an incentive to change and some assistance in 
doing so. Program improvement is also giv- 
ing state departments of education more of a 
role in assuring program quality, as opposed 
to a primary emphasis on fiscal and regula- 
tory monitoring (see Plunkett, 1991, this is- 
sue). Yet the very importance of the new 
program improvement guidelines places an 
added responsibility on them to be certain 
that they are fair and valid, and most impor- 
tant, that they reward schools for appropriate 
policies and practices. The purpose of this 
article is (a) to examine key aspects of pro- 
gram improvement guidelines to attempt lo 
determine the degree to which they are likely 
to promote positive changes in school poli- 
cies and practices and (b) to propose an alter- 
native system designed to avoid the problems 
in the current one. 

The identification of schools as being in 
need of program improvement is almost al- 
ways based on the calculation of gains in nor- 
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mal curve equivalents (NCEs) from spring to 
spring or from fall to fall. 1 In principle, stu- 
dents who make the same progress as a test's 
norming population will receive the same 
NCE score each year (just as they would the 
same percentile rank). Most states have set a 
criterion for success of an average gain of 
more than zero and as many as three NCEs 
(see Heid, 1991, this issue) in each school, on 
the principle (laid out in the federal regula- 
tions) that Chapter 1 students, who are by 
definition performing below grade level, 
should be gaining on the national norming 
group to head toward grade-level perfor- 
mance. 

NCE gains are computed from a baseline 
established from testing at the end of the 
previous grade. For example, a student who 
scores at an NCE of 30 in spring of first grade 
and 30 at the end of second grade would be 
considered to have made no gain; less than 
this is referred to as "negative gain. " Of 
course, students whose scores are at the same 
NCE each year have gained in achievement, 
but they have not gained in comparison with 
the test's norming population. 

In addition to NCE gains, states and dis- 
tricts are encouraged to include standards for 
"desired outcomes," such as reduced reten- 
tions, increased parent participation, or im- 
proved early childhood outcomes. However, 
because failure on any one of these can place 
a school in program improvement, most 
states and districts have either avoided "de- 
sired outcomes" or have made them easy to 
achieve (see Martinez, 1991). 

In theory, the program improvement stan- 
dards are sensible, in that they focus schools 
on the outcomes of instruction, not only on 
compliance with regulations regarding pro- 
gram operation. They also avoid the well-docu- 
mented problems of fall-to-spring testing 
which plagued earlier Chapter 1 evaluation 
procedures. In practice, the new standards 
provide incentives for schools to improve 
their Chapter 1 programs, but they also cre- 
ate a few incentives which run counter to the 
intentions of the law and to any standard of 
common sense. 

Not surprisingly, schools and school dis- 
tricts regard identification for program im- 
provement as bad news. In school districts in 
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which Chapter 1 plays a major role, many 
principals feel that identification of their 
school as being in need of program improve- 
ment will impact negatively on their careers, 
and in some districts this link is made explic- 
itly by district administration. Clearly, most 
schools will be motivated to avoid being iden- 
tified as in need of program improvement. 

Most Chapter 1 schools probably try to 
avoid being identified for program improve- 
ment by attempting to improve the quality of 
their programs, as was intended in the legis- 
lation. However, the program improvement 
guidelines contain a few serious flaws which 
have the unintended effect of punishing 
schools for investing in early intervention 
(i.e., preschool, kindergarten, and first- 
grade programs) and rewarding them for re- 
taining students and teaching a narrow set of 
skills. This article discusses these and other 
flaws in program improvement guidelines 
and proposes alternatives which would retain 
the positive features of the approach while 
eliminating these counterproductive fea- 
tures. 

Flaws in Program Improvement Guidelines 

Punishing Early Intervention 

One of the most important themes raised 
by reformers of Chapter 1 is the idea that 
compensatory services should shift from an 
emphasis on remediation to an emphasis on 
prevention and early intervention (see, for 
example, Slavin, Karweit, & Madden, 1989). 
In recent years Chapter 1 dollars have in- 
creasingly been used to provide preschool, 
extended-day kindergarten, or intensive in- 
tervention in first grades for at-risk students, 
on the theory tha«. it makes more sense to see 
that students begin with and maintain success 
than to let them fall behind in basic skills and 
only then provide remedial services. Pro- 
grams such as Reading Recovery (Pinnell, 
1989) and Success for All (Slavin, Madden, 
Karweit, Livermon, & Dolan, 1990), both of 
which provide one-to-one tutoring to at-risk 
first graders, have been highly effective in 
ensuring adequate reading skills among at- 
risk first graders, gains which have been 
maintained in later grades. 

Yet program improvement standards may 
inadvertently punish schools for investing in 
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any kind of early intervention; in fact, the 
more effective the early intervention, the 
more the school may be punished. The prob- 
lem is that NCE gains are measured from a 
baseline established at the end of first grade. 
If students score very well at the end of first 
grade, this may make it more difficult to show 
continued NCE gains in Grades 2 and beyond. 
That is, a school which invests in preschool, 
kindergarten, or first-grade intervention may 
increase first-grade scores and therefore un- 
dermine its own program by having limited 
gains in the later grades. Regardless of whether 
or not schools continue to show gains in the 
later years, the impact of their early interven- 
tion will not be seen in the scores that matter 
most to Chapter 1 eHministrators. 

To illustrate this, consider two identical 
schools v/ith identical historical distributions 
of student scores. One school, Lowenslow 
Elementary, provides no Chapter 1 services 
until the second grade, at which time it offers 
traditional pullout services. The other, 
Brighton-Early Elementary, invests in ex- 
tended-day kindergarten and one-to-one tu- 
toring for the lowest-achieving first graders. 
Imagine th't the effec of these interventions 
is to raise the achievement of at-risk students 
at Brighton-Early by 75% of a standard devi- 
ation , equivalent to an NCE gain of 16 Gains 
of this size are typical of studies of early 
intervention programs such as Reading Re- 
covery (Pinnell, 1989) and Success for All 
(Madden, Slavin, Karweit, Dolan, & Wasik, 
1991). 

Table 1 shows a hypothetical distribution 
of scores for at-risk students at the two 
schools at the beginning of kindergarten (as- 
sume for the sake of argument that NCEs 
could be reliably measured at this grade 
level). "At-risk" is defined in this case as 
performing below an NCE of 30, the criterion 
for Chapter 1 services in both schools. Before 
any intervention , there are 20 kindergartners 
in each school scoring below 30 NCEs. 

By the end of first grade,. Lowenslow stu- 
dents have received no intervention, so they 
have stayed at the same performance level. In 
contrast, the 16 NCEs gained by Brighton- 
Early students have pushed all but six of the 
students over the criterion for Chapter 1 ser- 
vices. The NCE mean for all 20 students is 
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now 35.3, but for the six remaining Chapter 
1-eligible students it is 23.8 (see Table 1). 
This is now the baseline for NCE gains in 
Grades 2 and beyond. If at-risk students at 
Lowenslow and Brighton-Early then make 
gains of 3 NCEs each year, they will look (to 
Chapter 1 evaluations) like equally effective 
schools. Yet obviously they are not. At best, 
Brighton-Early's investment in early inter- 
vention does nothing to help it look good in 
the most important Chapter 1 evaluations 
(NCE gains in Grades 2 and up) and may 
increase its chances of ending up in program 
improvement by raising its end-of -first-grade 
baseline. The Chapter 1 assessment guide- 
lines would fail to take notice of the most 
remarkable achievement of Brighton-Early's 
program: the fact that it dramatically in- 
creased end-of-first-grade performance and 
substantially reduced the number of students 
in need of remedial services. 

The Chapter 1 Policy Manual does allow 
schools to submit data other than stan- 
dardized test scores to evaluate preschool, 
kindergarten, and first-grade programs, but 
it forbids averaging any of these measures 
with those given in Grades 2 and up. This 
means that a school cnuld have a very effec- 
tive early intervention program (and have 
data to support if.) but could still be identified 
for program improvement (on the basis of the 
standardized test scores from the higher 
grades). 

The rationale for using end-of-first-grade 
scores as a baseline solved one problem in- 
herent in earlier Chapter 1 evaluation sys- 
tems, the relatively low reliability of end-of- 
kindergarten or early first-grade scores. 
However, flawed as it was, the earlier system 
at least communicated to schools that first- 
grade progress would count toward their suc- 
cess as a Chapter 1 school. In our own Suc- 
cess for All program (Madden et al., 1991), 
at least one principal moved tutors from the 
first to the second grade solely on the basis of 
the new Chapter 1 standards. It makes no 
instructional sense to allow at-risk students to 
fail in reading in first grade and then tutor 
them in second, but this was one unintended 
impact of the new guidelines. The principal 
was simply responding to a pciccption that 
first-grade achievement did not contribute to 
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TABLE 1 

Hypothetical NCEs for Chapter ]-Eligible Students 



Measure 



Beginning 
kindergarten 



End of Grade 1 



End of Grade 2 
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(36) 


23 


((39)) 


18 


18 


18 


(34) 


21 


((37)) 


16 


16 


16 


(32) 


19 


((35)) 


14 


14 


14 


30 


17 


(33) 


12 


12 


12 


28 


15 


(31) 


9 


9 


9 


25 


12 


28 


7 


7 


7 


23 


10 


26 


4 


4 


4 


20 


7 


23 
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1 


1 


17 


4 


20 


19.3 


19.3 


19.3 


35.3 


22.3 


38.3 






19.3 


23.8 


22.3 


26.8 


20 


20 


20 


6 


16 


4 










+ 3.0 


+ 3.0 






0 


+ 16.0 


+ 3.0 


+ 19.0 



Mean NCE (A) 
Mean NCE (B) 
No. eligible 
NCE gain, Grades 1-2 
NCE gain from kin- 
dergarten 



Note. Mean NCE (A) = mean NCE (normal curve equivalent) for the original group of at-risk students ( N = 20 in each 
school). Mean NCE (B) = mean NCE for students who still fall below an NCE of 30 at the end of first grade. Mean NCE 
(B) in spring of second grade minus mean NCE (B) in spring of first grade is the NCE gain for second grade. Scores with 
single parentheses are for students who were below an NCE of 30 in the prior year but not the current year. (These scores 
are used as posttcsts for NCE gains but not as pretests for the next year's gains. ) Scores with double parentheses are for 
students who were in the original set of at-risk students but did not fall below an NCE of 30 in the prior or current year and 
therefore are not included in NCE-gain calculations. 
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the success of her school according to the new 
standards. 

Rewarding Retention 

A serious consequence of program im- 
provement procedures is that they can inad- 
vertently reward retention of students in 
Grades 2 and up. The reason for this is that 
when students take a test one year and then 
take the same test in the same grade the next 
year, their increase in percentile rank (and 
therefore NCEs) is very large, even though 
the students may in fact have gained little 
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beyond the gain attributable to being a year 
older. 

For example, consider a fourth grader who 
scores at the 20th percentile on the California 
Achievement Test (CAT, Form C) Total 
Reading Scale at the end of fourth grade and 
again at the 20th percentile on the CAT at the 
end of fifth grade. If the student had instead 
been retained, the same scale score would 
have placed him or her at the 43rd percentile. 
This apparent "gain" (from the 20th to the 
43rd percentile) is entirely due to retention, 
not to improved performance. Put another 
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way, this student went from a grade equiva- 
lent of 3.4 to one of 4.0. To have actually 
scored at the 43rd percentile in the fifth 
grade, the student would have had to in- 
crease to a grade equivalent of 5.3. That is, 
being retained gave the student an apparent 
bonus of 1.3 years of learning! One study 
which followed students from Grades 2-4 
found that while students who were pro- 
moted each year gained an average of 5.4 
NCEs, those who failed a grade gained 20.7, 
a "retention bonus" of 15.3 points (Karweit, 
1991). Table 2 shows gains in percentiles and 
NCEs for retained students in a Florida 
study. The gains of nearly 17 NCEs in read- 
ing and 21 in math are far greater than the 
national average of 3.0 and 4.3 NCEs, re- 
spectively (Sinclair & Gutman, 1990). Since 
retained students come almost entirely from 
the ranks of students eligible for Chapter 1 
services, even relatively small differences be- 
tween schools ir retention rates can lead to 
substantial differences in NCE gains. As a 
result, schools are unintentionally rewarded 
by program improvement guidelines for hav- 
ing high retention rates. 

To illustrate the potential impact of re- 
tentions, imagine that Lowenslow Elemen- 
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tary decided to retain students in Grades 2 
and above scoring at or below the 10th per- 
centile and that this created an apparent 
NCE bonus of 15 points for each retained 
student. The three retained students (a re- 
tention rate of only 3.3% if there are a total 
of 90 second graders) would increase the 
NCE gains for second graders from + 3.0 to 
+ 5.25. Retaining all students with scores 
below an NCE of 20 would still fail only 
7.8% of Lowenslow first graders yet would 
increase apparent NCE gains from + 3.0 to 
+ 8.25. Note that the same retention poli- 
cies would produce no retentions at 
Brighton-Early Elementary, and as a re- 
sult, this school (with a 0% retention rate 
and only four students [4.4%] still quali- 
fying for Chapter 1 services) would appear 
much less successful than its twin. In urban 
school districts where retention rates often 
approach 20%, school-to-school variations 
in retentions could be much more impor- 
tant than actual program effectiveness in 
determining which schools are selected for 
program improvement (see Karweit, 1991). 

Some school districts may deal with reten- 
tions by excluding them from the analysis of 
NCE gains. This is discouraged by the Chap- 
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TABLE 2 

NCE Gains Due to Retention 



Grade 
and test 



Year in which student 
was retained 



Percentile 



NCE 



Reading comprehension 

1 23 

2 12 

3 11 

4 11 

5 10 

Mean, 1-5 
Math computations 

1 20 

2 24 

3 31 

4 23 

5 24 

Mean, 1-5 



35 
25 
24 
24 
23 



32 
35 
40 
35 
35 



Year in which grade 
was repeated 



Percentile 



NCE 



62 
42 
33 
28 
23 



66 
65 
69 
50 
55 



55 
46 
41 
38 
35 



59 
58 
61 
50 
53 



NCE 
gain 



+ 20 
+ 21 
+ 17 
+ 14 
+ 12 

-M6.8 

+ 27 
+ 23 
+ 21 
+ 15 
+ 18 

+ 20.8 



Now. NCE = normal curve equivalent score. Data arc from Pinellas County. Florida, 1978-1979 (from 
Tocco, 1983). 



Elligctt & 



9f 



FFICEo/ 
ESEARCH 



3»y 



Volume 1, No. 1> Summer 1993 



Slavin and Madden 



ter 1 Policy Manual (U.S. Department cf 
Education, 1990), which emphasizes the re- 
quirement to seek a score for every child, but 
in any case it does not solve the problem, 
because retention still removes from the sam- 
ple students who are by definition the lowest 
achievers. 

The Chapter 1 Policy Manual does recom- 
mend that schools submit additional informa- 
tion, including retention rates, as a part of its 
documentation of program impacts, but 
there is no federal requirement to do so. In 
any case a school which reports a low reten- 
tion rate and therefore makes small NCE 
gains may still be in program improvement, 
whereas one which produced high apparent 
NCE gains by retaining large numbers of stu- 
dents will avoid program improvement (and 
need not report its retentions). 

To the degree that program improvement 
guidelines accelerate a trend toward increas- 
ing retentions in elementary schools, they 
could have a disastrous effect on at-risk chil- 
dren. Long-term effects of retention are neg- 
ative on many outcomes, academic as well as 
social and behavioral (Shepard & Smith, 
1989). Disadvantaged students who have 
been retained before third grade are very 
unlikely to graduate from high school 
(Lloyd, 1978). Because of a realization of 
these long-term impacts, many urban school 
districts are now seeking to reduce retention 
rates which in many cases have exceeded 
20% for certain grades (such as first). It 
would be tragic if Chapter 1 program im- 
provement guidelines were to unwittingly 
punish districts for moving in this direction. 

We are not suggesting that principals 
would deliberately fail more children to arti- 
ficially increase NCE gains. However, 
schools which reduce their retention rates 
will have lower NCE gains and may be mis- 
takenly singled out for criticism. Those which 
increase their retentions will have higher 
gains and may be mistakenly singled out for 
praise. This could ultimately result in a shift 
toward policies which increase retentions. 

Rewarding Teaching (he Test 

A.problcin that has always plagued the use 
of standardized test scores for high-stakes 
accountability is the degree to which schools 



can appear to do well by teaching very nar- 
rowly those skills assessed on the tests (and 
ignoring other content). The new program 
improvement guidelines improve on this by 
emphasizing scores on advanced, as well as 
basic, skills, but the "advanced" skills in 
questions are really the same scales that have 
typically been given in the past, such as read- 
ing comprehension and math concepts and 
applications. 

Practices which fall under the heading of 
"teaching the test" range from the relatively 
benign to the unethical (see Stringfield, 
Hartman, Pechman, & Brooks, 1985). At the 
benign end is "curriculum alignment, a 
focus of teaching efforts on the general skills 
or concepts being tested. Curriculum align- 
ment is justifiable to the extent that one ac- 
cepts what is on the test as the full range of 
what children should learn, an assumption 
that is perhaps tenable in some areas (such as 
math computation) and untenable in others 
(such as language arts tests without writing 
samples.) Teaching general test-taking skills 
also certainly falls on the benign end. How- 
ever, both curriculum alignment and test 
skills are often overdone in high-stakes test- 
ing. For example, many urban elementary 
schools have little serious instruction in social 
studies, science, or writing, at least in part 
because these are not on the standardized 
tests. Many districts and schools have care- 
fully examined the standardized tests and 
rooted out from their curriculum any objec- 
tives not explicitly tested. This has led in 
many cases to a great deal of teaching of 
isolated skills that is counterproductive to 
learning (but does improve test scores). 

At the unethical end of the "teaching the 
test" continuum fall a variety of undesirable 
practices. In one common situation teachers 
become familiar with particular tesiS and 
make sure that they teach specific items 
known to be on the tests. For example, ele- 
mentary vocabulary scales rarely involve 
more than 15 words, and teachers often learn 
these words and make certain to emphasize 
them in their teaching (in lieu of other kinds 
of vocabulary teaching). 

The effects of teaching the test, teaching 
test-taking skills, and other means of increas- 
ing student scores without increasing their 
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learning can be considerable. In a recent 
study, Koretz, Linn, Dunbar, and Shepard 
(1991) administered additional tests after 
standardized testing in "high-stakes testing" 
districts. One of the additional tests had been 
equated with the standardized tests in low- 
stakes districts, yet in the high-stakes districts 
school means on the alternative tests were 
substantially lower (by as much as 16 percent- 
ile points) than scores on the standardized 
tests used for accountability. The difference, 
the authors argue, is due to "teaching the 
test" and teaching test-taking skills; item-by- 
item comparisons (Flexner, 1991) and 
teacher surveys (Shepard & Dougherty, 
1991) support this interpretation. 

The new program improvement guidelines 
do not break any new ground in rewarding 
teaching to the test: '«11 accountability pro- 
grams suffer from What is new, how- 
ever, is that the new standards raise the 
stakes for Chapter 1 schools and may thereby 
perpetuate a long-standing problem. 

Other Problems of Chapter 1 Assessment 

In addition to those mentioned above, 
there are several other problems with Chap- 
ter 1 assessment procedures which may not 
reward inappropriate policies but still may 
lead to problems in accurately identifying ef- 
fective and ineffective programs. Chapter 1 
assessments may be based on fewer than half 
of Chapter 1 -eligible students (Bushner, 
1991). If the students who took both tests 
were representative of all Chapter 1 stu- 
dents , this would be a minor problem, but it is 
more likely that missing students would make 
lower-than-average gains; thus, excluding 
them may overstate apparent gains. Worse, 
some schools may be less than relentless in 
obtaining a test from absent children who are 
unlikely to do well. 

The reliability of NCE gains as an indicator 
of school effectiveness is another serious 
problem. Gain scores always have less re- 
liability than do point-in-time scores, but this 
problem is compounded by any number of 
random factors, including the problem of 
missing students mentioned earlier. As one 
indication of the unreliability of NCE gains, 
Bushner (199i) compared fall-to-fall and 
spring-to-spring scores for the same schools 
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in the same year and found no correspon- 
dence between the two; in fact, among the six 
schools followed, the highest-gaining school 
in the fall-to-fall assessment was the second 
to lowest in the spring-to-spring data, and the 
lowest-gaining fall-to-fall school was the sec- 
ond highest-gaining in spring-to-spring as- 
sessments. Consistency over time was also 
low; the second and third highest-gaining 
schools in the spring-to-spring 1988-1989 as- 
sessments were the two lowest-gaining 
schools on spring-to-spring 1989-1990 as- 
sessments. If NCE gains are to be used as 
high-stakes indicators of program effective- 
ness, they must be stable, meaningful, and 
reliable indicators. Clearly, this is not the 
case. 

Finally, there is a problem of statistical 
regression that has long been noted in Chap- 
ter 1 evaluations (see Gabriel et al., 1985). 
That is, entirely because of random variation 
(e.g., bad luck), some students score below 
the cutoff for Chapter 1 services. Since bad 
luck is unlikely to happen twice, the student's 
score next year is likely to be higher, a situa- 
tion which creates an apparent positive effect 
in the current Chapter 1 assessment system 
In addition, fluctuation around the cutoff 
score can create an illusion of gain. For exam- 
ple, imagine that a student's NCE scores are 
28, 32, 28, 32, 28 at the end of Grades 1-5, 
respectively, in a district using an NCE of 30 
as a criterion for Chapter 1 eligibility. This 
student would show a gain of 4 NCEs in 
Grade 2. In Grade 3 he or she would not 
receive services, so the loss of 4 NCEs does 
not count on the Chapter 1 assessment. Then 
he or she appears to gain again in Grade 4, 
and so on. The Chapter 1 policy manual rec- 
ognizes this problem and invites districts to 
correct su^es for regression if they wish, but 
it is doubtful than any would do so because it 
is difficult and would have the effect of reduc- 
ing scores. 

Alternative Approaches 

It is easy to criticize any accountability 
program but far more difficult to suggest 
practical alternatives. It is neither politically 
possible nor desirable to do away with ac- 
countability in Chapter 1 programs; we must 
have some outcome-based criterion on which 



17 2 H 1 Volume 1, No. I, Summer 1993 




Slavin and Madden 

to judge the impact of Chapter 1 in each 
school and district. The program improve- 
ment guidelines implemented under the 
Hawkins-Stafford bill are improvements in 
many ways over earlier procedures. Yet they 
still reward some policies we would want to 
discourage, and they punish policies we 
would want to encourage; thus, they are in 
need of major change . The following sections 
discuss a set of recommendations for a system 
which might accomplish the goals for which 
program improvement guidelines were origi- 
nally designed. 

Broad-Based Tests 

The most important thing we have learned 
after 15 years of "accountability" in educa- 
tion is that high-stakes assessments do in fact 
drive instruction and other school practices 
but that if schools can find an easier way to 
affect assessments than to do a better job of 
teaching, they will often do so (Koretz et al., 
1991). Therefore, assessments must be de- 
signed to be so broad and so appropriate to 
what we want students to do that they are 
worth teaching to and cannot be influenced 
by any kind of narrow teaching or 14 test-wise- 
ness." The best model we have for such a test 
is NAEP, which uses matrix sampling, 
whereby different students take different 
portions of a very comprehensive test. 

Broad-based tests should include some 
forms of "authentic" assessment, such as in- 
dividually administered tests involving read- 
ing and comprehending real children's litera- 
ture, writing samples, and open-ended prob- 
lem solving in math, along with basic skills. 
The use of matrix sampling does not provide 
ideal student scores, but it does provide ex- 
cellent information on school effectiveness. 
Most important, the use of a broad-based test 
would reward broad-based teaching, and use 
of "authentic" measures would reward the 
teaching of meaningful reading (not only 
skills), meaningful writing (not only lan- 
guage mechanics), and meaningful math (not 
only algorithms). Students might still take 
diagnostic tests to determine eligibility for 
services and as formative tests for school use, 
but the assessment of the program (as dis- 
tinct from the students) would be based on 
these broader assessments. Some core tests 



might be given in every grade, while others 
(possibly including tests of science and social 
studies) might be given every few years. 

An Alternative Model 
of Chapter I Assessment 

We propose a system in which Chapter 1 
schools are evaluated on the degree to which 
they can reduce the number of students in 
need of remedial services. In this system, 
states (or districts) would set minimum per- 
formance criteria for students at e?ch grade 
level from pre-K on. The tests used at differ- 
ent grade levels could be different. For exam- 
ple, tests of preschool and kindergarten pro- 
grams might focus on language development, 
and first-grade tests might include individual 
reading assessments, perhaps administered 
by Chapter 1 teachers from other schools. In 
the early years of such a system, passing 
scores could be established for existing stan- 
dardized tests, but as states introduce new, 
more appropriate measures at selected grade 
levels, minimum performance criteria could 
be established for them. If matrix sampling is 
used, passing scores for each test form could 
easily be established. Existing standardized 
tests could be used until better tests are es- 
tablished at each grade level. Each student 
would be identified as meeting or not meet- 
ing minimum standards. Any students who 
were retained would be counted as not meet- 
ing standards, as would any student on roll in 
the spring and in the school all year who did 
not take a test. The idea here is to encourage 
schools to promote students and to try to 
obtain a valid test from every Chapter 1 stu- 
dent. Retaining students or failing to give 
them a test would provide no benefit to the 
school's scores because these students are 
counted into the school's total as not meeting 
standards. 

In this system, the school would be re- 
warded for successively reducing its propor- 
tion of students failing to meet minimum 
standards, combining across all grade levels 
at which Chapter 1 dollars are spent. The 
initial baseline for this comparison would be 
based on a determination of how many stu- 
dents would have met minimum standards on 
tests given for three years before the new 
assessment system was implemented. That 
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is, standards for existing standardized tests 
would be established and used retrospec- 
tively to establish a baseline. This conforms 
*o the critical principle that any baseline es- 
tablished for high-stakes assessment itself be 
a high-stakes assessment, so that schools 
would already be doing their best. After the 
first year, the proportion of students meeting 
minimum standards would always be com- 
pared with the proportion in the previous 
three years, so zigs and zags in baselines 
would not influence ratings of school success. 

There are several important advantages to 
this system. First, it would allow for easy 
pooling of results across all grade levels; thus, 
schools could appropriately assess preschool, 
kindergarten, and first-grade students and 
have their successes added to the school's 
success. This also allows schools to use "au- 
thentic" tests, criterion-referenced tests, and 
other measures at some or all grade levels 
and thereby releases them from the require- 
ment to use standardized tests solely because 
they produce NCE scores. 

Because this system would use a point-in- 
time measure and because it would include 
an incentive to obtain a valid test from every 
student, the problems of missing data would 
be greatly reduced. 

The system we have proposed would en- 
courage early intervention. A child who re- 
ceived, for example, one-to-one tutoring in 
reading in first grade and therefore never 
needed further remediation would count ev- 
ery year as exceeding minimum standards. In 
the example comparing Brighton-Early Ele- 
mentary to Lowenslow Elementary, the suc- 
cess of the early intervention at Brighton- 
Early would be clearly shown in this assess- 
ment system. What is important about 
Brighton-Early is not the increase in NCEs it 
brought about among its at-risk students but 
the reduction in the number of students fal- 
ling below minimum standards while a zero 
retention rate was maintained. 

Rewarding Success 

Accountability systems primarily motivate 
educators to do their best by giving them 
internal benchmarks to judge their progress 
toward desired goals and by publicizing 
schools that arc doing well or poorly. A prin- 



cipal wants his or her school to do well out of 
a sense of professional pride and wants to 
avoid falling into program improvement be- 
cause it is embarrassing and potentially dam- 
aging to his or her career, not because the 
school staff has to change plans or attend 
workshops. Simply providing feedback on 
progress toward reducing the need for reme- 
dial services may be enough in most districts. 

However, there may be a rationale for re- 
warding schools for increasing the proportion 
of students meeting minimum standards by 
giving the staff greater freedom in using 
Chapter 1 dollars and for imposing more re- 
strictions on schools failing to reduce their 
Chapter 1 caseloads. For example, schools 
with a record of moving students out of Chap- 
ter 1 eligibility might qualify for schoolwide 
status, even if they do not meet the 75% 
poverty criterion. In contrast, schools failing 
to move students might be placed under sub- 
stantial scrutiny by local and state regulators. 

Potential Problems and Potential Solutions 

As in any accountability system, there are 
several problems with the one we have pro- 
posed. First, a school undergoing major de- 
mographic changes might appear to be de- 
clining in the percentage of students meeting 
minimum standards. This could be dealt with 
by allowing schools to submit demographic 
data (e.g., increases in the percentage of stu- 
dents qualifying for free lunch) to explain any 
declines. 

Second, a school which has always done a 
good job will start with a higher baseline than 
one that has done poorly and may therefore 
have more difficulty making further gains. 
Using a three-year baseline would somewhat 
diminish this problem, but the only real re- 
sponse is to note that any Chapter 1 school 
can always get better. 

Third, it may be unfair to hold schools fully 
responsible for students new to the school. 
This problem might be solved by counting 
only students in the school for at least two 
years. 

Another potential problem involves the 
use of a single criterion of success for each 
assessment. If this criterion were set too low, 
it might focus schools on minimum skills, 
whereas a high standard might lead them to 
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focus on students near the passing score, ig- 
noring those felt to be unlikely to pass under 
any circumstances. One solution to this prob- 
lem might be to set two standards for each 
test: a ''minimum" standard and a "basic" 
standard, where the basic level would 
roughly correspond to what is called "at 
grade level" on today's tests. Chapter 1 
schools might be evaluated according to the 
degree to which they could move students 
beyond both of these standards. Advanced 
levels might also be established, particularly 
for use in schoolwide projects in which Chap- 
ter 1 has a legitimate interest in the perfor- 
mance of all students, not only low achievers. 

Another important limitation of this sys- 
tem is that it uses as a criterion of success the 
very measure that qualifies a school for 
Chapter 1 funding in some districts; thus, a 
school doing well on the assessments could be 
reducing its Chapter 1 resources. A solution 
to this would be to base Chapter 1 funding 
solely on poverty. Other "hold harmless" 
provisions might be applied to make certain 
that schools which are reducing their Chapter 
1 caseloads are not penalized for doing so. 

Program Improvement 

To live up to its name, program improve- 
ment must go beyond being primarily an ac- 
countability program and must devote much 
more attention and resources to actually im- 
proving programs. Chapter 1 needs to play a 
far greater role in staff development and in 
providing proven programs to students. The 
Hawkins-Stafford bill provided very modest 
funds for staff development, but a far greater 
focus on this aspect of Chapter 1 services is 
still needed (see Slavin, 1991). Ideally, 
schools should be able to receive on-site as- 
sistance to help them implement effective 
practices. This could be provided by state or 
regional Chapter 1 Effectiveness Centers 
staffed by professionals trained in various ef- 
fective models and in the dissemination and 
implementation of effective practices (see 
Slavin, 1987). Such services should be avail- 
able to all Chapter 1 schools or perhaps to 
schools serving large numbers of Chapter 1 
students; these services should not be seen as 
a trip to the woodshed for schools who don't 
measure up. However, there would obviously 



be pressure for schools not meeting adequate 
standards to change programs and to invite in 
experts on effective models. 

Conclusions 

Throughout this article we have discussed 
the possibility that certain features of pro- 
gram improvement guidelines may reward 
schools for implementing inappropriate poli- 
cies, such as avoiding early intervention, in- 
creasing retentions, or teaching to a narrow 
set of objectives. We do not mean to suggest 
that large numbers of principals would take 
advantage of these pj ovisions to increase stu- 
dents' scores without increasing their learn- 
ing. Rather, the danger is that schools which 
are working in good faith to implement early 
intervention models, to reduce retentions, 
and to encourage teachers to teach a full and 
appropriate curriculum may end up looking 
mediocre or worse in terms of NCE gain, 
even if their Chapter 1 children are in fact 
succeeding. At the same time , r £hools which 
are emphasizing remediation rather than 
prevention, retaining large numbers of stu- 
dents, and teaching narrowly to the stan- 
dardized tests may mistakenly be held up as 
positive examples because of high NCE 
gains. If this occurs, innovative schools could 
become discouraged with reform and could 
return to the more traditional Chapter 1 
practices which are more in line with the 
existing standards. 

The solutions proposed in this article rep- 
resent only a few among many possible 
ways we might revise program improve- 
ment guidelines. Our intention is simply to 
begin a discussion about modifications in 
program improvement guidelines to put 
them firmly behind (or a least not in the way 
of) school policies likely to benefit Chapter 
1 children. Chapter 1 means a great deal to 
our most vulnerable children. We cannot 
rest until we are sure that Chapter 1 dollars 
are buying the most effective programs pos- 
sible and that Chapter 1 policies are re- 
warding school practices conducive to the 
success of all children. 

Notes 

Wc would like to thank Sam Stringficld, James 
Smith, Mary Jean LcTcndre, Nancy Karweit, 
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Steve Davidoff, and Rita Altman for their com- 
ments on earlier drafts of this paper. However, we 
take full responsibility for the opinions presented. 

This paper was written under a grant from the 
Office of Educational Research and Improvement, 
U.S. Department of Education (No. OERI- 
R-117-R-90002). However, any opinions ex- 
pressed are our own and do not represent OERI 
positions or policies. 

1 A normal curve equivalent is a statistic similar 
to a percentile which ranges from 1 to 99, with a 
mean of 50 and a standard deviation of approx- 
imately 21. 
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Chapter 1 Program Improvement: Cause for Cautious Optimism 
and a Call for Much More Research 
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The program improvement provisions of the Hawkins-Stafford Amendments to Chapter 1 rest 
on the optimistic premise that school- level accountability pressures directed at Chapter 1 will lead 
to higher academic achievement for educationally disadvantaged students. Although the legisla- 
tion may be unrealistic in assuming that improvement is primarily an act f will, it correctly 
focuses on the school as the appropriate unit for change. Principals of over 200 schools identified 
for program improvement in three states were surveyed to determine local responses to the new 
provisions. Over two-thirds of responding schools had begun to implement programmatic 
changes. Fully 84% supported the legislative provisions. Research is called for to study the effects 
of the legislation and to provide additional options to low-performing schools. 



The most important and most optimistic sec- 
tions of the 1988 Hawkins-Stafford Amend- 
ments to Chapter 1 were those dealing with 
program improvement. Hawkins-Stafford 
clearly reaffirmed that Title I/Chapter 1 was 
to be an educational program, not merely a 
funding program. By focusing on program 
improvement, the authors of the legislation 
set a tone and academic direction which has 
permeated discussions of Chapter 1. By link- 
ing program improvement to each school's 
Chapter 1 evaluation, the program improve- 
ment requirements reawakened local educa- 
tors to the potentially powerful links between 
evaluation data and programming options. 
These were among the directions the original 
authors of Title I intended and were among 
the connections between evaluation and in- 
struction which Congress had sought for over 
20 years. The program improvement sections 
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bore the unmistakable optimism of re- 
formers. On a technical front, the authors 
assumed that local evaluations could be con- 
ducted which would possess sufficient re- 
liability, validity, and clarity to serve two pur- 
poses. Hawkins-Stafford stipulates that local 
evaluations will be used to target poorly per- 
forming schools and to guide program im- 
provement. 

More broadly, the legislation assumes that 
there exist sufficient research, practical wis- 
dom, and professional will so that teachers, 
paraprofessionals, and administrators in 
thousands of local schools — assisted by their 
districts and state departments of educa- 
tion — can and will improve the quality and 
quantity of services to their most needy stu- 
dents. Given the number of negative reports 
on the state of American education which 
had poured forth in the preceding five years, 
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Hawkins-Stafford might seem extraordi- 
narily optimistic. However, the Congress 
tempered its high hopes with a series of steps 
that schools which do not meet the high ex- 
pectations of Hawkins-Stafford must under- 
take. A stick came with the carrot. 

We believe some of that optimism was jus- 
tified and that over the next several years 
prudent local and national action can result in 
a state of affairs in which today's optimism 
becomes tomorrow's fact. In this article we 
examine program improvement on a practi- 
cal level, overview the challenge « facing per- 
sons attempting program improvement, de- 
scribe one study of the effects on local 
educators of participating in program im- 
provement, and draw implications for the 
1993 reauthorization of Chapter 1. 

The Practical Workings of Chapter 1 
Program Improvement 

The 1988 reauthorization of Chapter 1 re- 
quired states to establish "Committees of 
Practitioners" which would set and peri- 
odically examine minimum standards for lo- 
cal schools to use in demonstrating the effec- 
tiveness of their compensatory education 
programs. As Heid (1991, this issue) notes, 
most state committees have set a minimum 
standard of 4< more than zero NCE gains on 
norm-referenced achievement test (NRTs)." 
(An NCE is a normalized standard score 
matching the percentile distribution at values 
of 1, 50, and 99, with a mean of 50 and a 
standard deviation of 21.06.) As a practical 
matter, the states were attempting to declare 
that a child must show more gain than would 
statistically be expected without Chapter 1. 
States have allowed schools to set additional 
criteria beyond NRT gains, but the wording 
of the law has typically been interpreted as 
indicating that if a school does not achieve all 
of its goals, it must enter into program im- 
provement. This interpretation has provided 
little incentive for local schools to place addi- 
tional requirements on themselves, and most 
schools appear to be declaring the minimum 
goals on the minimum number of criteria. 

If a school's Chapter 1 students do not 
demonstrate the gains the school declared it 
would achieve, the school is identified as 
needing program improvement. Hawkins- 



Stafford states that during the first school 
year after identification, the school must con- 
sult with parents and write a program im- 
provement plan. The plan is approved by the 
local educational agency (LEA) governing 
board and submitted to the state. Regula- 
tions suggest that minor improvements be 
implemented immediately; they allow up to a 
year for the implementation of more major 
changes. If there has been no improvement 
during the following year, the school must 
enter into a joint planning agreement with 
the state education agency (SEA) and local 
education agency. This process repeats until 
the school shows achievement gains. 

Challenges Facing the Program 
Improvement Initiative 

In the initial stages of any worthy under- 
taking, more reasons to predict failure can be 
listed than reasons to predict success. Pro- 
gram improvement is currently passing 
through such a period. The challenges facing 
honest efforts at implementing the program 
improvement sections of Hawkins-Stafford 
include, but are not limited to, the following: 
The conditions facing disadvantaged children 
have deteriorated considerably over the last 
1 1 years, and they have been compounded by 
the fact that there is a difficult- to-reverse 
gravitation of highly skilled teachers and ad- 
ministrators to schools serving the most ad- 
vantaged students; the levels of coordination 
between regular and Chapter 1 programs 
have often been inadequate ; statute implica- 
tions were often not fully explained to those 
who are being held most accountable; the 
existing technologies for achieving the goals 
of Hawkins-Stafford are often not strong and 
just as often poorly disseminated; and there 
remain technical problems with Chapter 1 
evaluation techniques. 

Worsening conditions. The conditions fac- 
ing an increasingly large percentage of 
America's families and their children have 
worsened over the last decade, and it appears 
that the numbers of children being raised in 
poverty will increase over the next decade 
(Natrielio, McDill, & Pallas, 1990; Wilson, 
1987). ITie educational significance of those 
statistics is probably best understood 
through careful case studies. Kotlowitz 
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(1991), for example, describes the lives of 
two youngsters growing up amid the drugs, 
shootings, and general societal collapse of 
Chicago's slums. The children live in abom- 
inable, unsafe conditions. They regularly 
have to hide from the stray bullets of drug- 
and gang-related shootouts. They do not al- 
ways have enough food. Children in these 
circumstances can hardly be expected to con- 
centrate on academics with the same single- 
minded ease as do suburban children, re- 
gardless of the quality of their schools. 

"The rich {schools] get richer and the 
poor. . . . "There exists in American educa- 
tion a gravitational-like pull from schools of- 
fering the most resources to attract the most 
highly trained principals and staff. Wimpel- 
berg, Teddlie, and Stringfield (1989) de- 
scribed this phenomenon as it affected two 
schools in one large system. The non-Chap- 
ter 1 magnet school had over 40 highly quali- 
fied applicants for every teaching opening, 
while the principal of the school serving a 
90+% free-lunch population often had to 
wait months for one qualified applicant for a 
position. It is relatively easy to motivate 
teachers to implement new programs when 
they know that there are 40 other qualified 
teachers eagerly awaiting their departure. 
The same motivational task is much more 
difficult when a principal knows that if he or 
she pressures a teacher too much, that 
teacher may leave, and some of the school's 
students may be served by "permanent sub- 
stitutes'' for the remainder of the school year. 

Historical isolation of categorical pro- 
grams. In many school districts, Chapter 1 
had inadvertently become an isolated cate- 
gorical program. The long-standing federal 
requirement that programs "supplement not 
supplant" regular programs had heightened 
this isolation . Many regular classroom 
teachers felt no connection at all to "Chap- 
ter." Many regular and Chapter 1 teachers 
had little or no knowledge of each other's 
programs, curricula, or instructional tech- 
niques. Yet, the Hawkins-Stafford Amend- 
ments hold whole schools accountable for 
students' success. Regular teachers and prin- 
cipals were unaccustomed to having input 
into the design and evaluation of their Chap- 
ter 1 services. 
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Occasional staffing concerns. In too many 
school districts, "Chapter" had become the 
retreat of highly senior staff who, regardless 
of their instructional talents, no longer 
wished to deal with the demands of 20-40 
students and who preferred working with 
groups of 2-5 students at a time in controlled 
environments. In some districts, Chapter 1 
had become the last refuge for teachers who 
would have been placed on probation or fired 
if a convenient place had not been available 
"where they won't hurt as many children." 
The chief teachers' union representative of 
one of the nation's 50 largest school districts 
once explained to the first author that she was 
able to virtually eliminate disciplinary ac- 
tions against teachers by having incompetent 
teachers shifted to Chapter 1. The school 
districts that allowed such practices now lack 
solid foundations on which to build improv- 
ing programs. 

Breakdowns in information flow. In many 
instances, the intentions and requirements of 
the legislation were not clearly articulated to 
local principals, teachers, and paraprofes- 
sionals who were responsible for successfully 
achieving provisions of the new law. In most 
states annual meetings of local Chapter 1 
coordinators are held to introduce any new 
wrinkles in the Chapter 1 law and regula- 
tions. Many local Chapter 1 coordinators re- 
port learning about small changes which 
were mandated with much fanfare one year 
and retracted the next. This history tended to 
have a deadening effect on local program 
coordinators' reactions to announced 
changes at the federal level. When truly ma- 
jor change came, local coordinators tended 
to take a "wait and see" attitude toward the 
new law. Hawkins-Stafford was not a "wait 
and see" piece of legislation. What had been 
adaptive behavior for local federal program 
administrators became dysfunctional. 

State Chapter 1 meetings are rarely at- 
tended by principals, and almost never by 
teachers. In probably thousands of school 
districts, the school-site principals, teachers, 
and paraprofessionals who were later held 
accountable for achievement gains under 
Hawkins-Stafford were often unaware of the 
requirement until they learned that they had 
been "targeted" (a most unfortunate choice 
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of words) for program improvement. Not 
surprisingly, "targeted" people have tended 
to respond defensively. Defensiveness is an 
excellent negative predictor of meaningful 
instructional change. Many of the federal, 
state, and local implementors of Hawkins- 
Stafford inadvertently created worst-case 
scenarios for initiating meaningful program 
improvement. 

Lack of research support. Program im- 
provement assumes the availability of pro- 
grams which are effective and transportable. 
Unfortunately, federal funding of program- 
matic research came to a virtual close after 
the unsuccessful Follow Through evaluations 
(Stallings & Kaskowitz, 1974). Good (per- 
sonal communication, February, 1989) esti- 
mated that, in constant dollars, federal fund- 
ing for educational research had dropped 
over 807c since 1973. 7b hold schools ac- 
countable for making programs work without 
providing research clearly indicating which 
programs achieve specific goals and which 
don't was a considerable act of optimism. 
Promising new beginnings in federal support 
for Chapter 1 research have been reported by 
Plisco and Scott (1991), but much more is 
needed. 

The programs which have at least mod- 
est evidence of effectiveness have not been 
well disseminated. Fullan (1982, 1991), 
Rosenblum and Louis (1981), Louis and 
Miles (1990), and Showers, Joyce, and Ben- 
nett (1987) provide clear conclusions regard- 
ing the conditions necessary for meaningful 
implementation of new programs. These in- 
clude multiyear processes with central roles 
for leadership and technical assistance. Clear 
visions and goals, early success, sustained 
interactions among the people being asked to 
change, and intensive staff development for 
everyone involved are research-supported el- 
ements of sustained change. At the teacher 
level, presentation of theory combined with 
modeling of appropriate new behaviors, op- 
portunities to practice new behaviors, quick 
and accurate feedback, and ongoing coach- 
ing are all supported by research. 

These are hardly cost-free elements. Yet 
funding for the National Diffusion Network 
and other dissemination activities shrank 
dramatically during the 1980s and now offers 



very few of the activities and almost none of 
the extended follow-up necessary for success- 
ful program implementation. The Council of 
Chief State School Officers (1991) estimates 
that there are 9,000 schools currently identi- 
fied as needing program improvement. For 
20 years state education agencies have been 
required to focus on technical compliance 
issues. Even if there were enough research 
available, there simply are not enough diffu- 
sion resources to meet this wellspring of de- 
mand. 

Evaluation use issues. Finally, there are 
technical and substantive problems with 
measurement and evaluation in Chapter 1. 
Much has been written about this elsewhere 
(see, for example, Davis, 1991, this issue; 
and Slavin & Madden, 1991, this issue). If 
there is one chance in 20 that a program has 
been misidentified on a technicality, then 
there will be at least 15 of 20 schools who 
perceive themselves to be the one. This can 
lead to either of two further technical prob- 
lems. Pn:** -pals and faculty may become so 
convinced that they have been unfairly tar- 
geted that they resist all suggestions and ef- 
forts at program improvement. Alternately, 
schools may opt to teach "test taking skills" 
or simply teach the test. Such processes not 
only take time from instruction and risk in- 
validating the scores, but they also may 
greatly raise the "pretest 11 scores from which 
next year's "posttests 1 ' must show gains. 
Thus they risk creating and then perpetuat- 
ing procedures which are, at best, invalidat- 
ing the evaluation and, at worst, unethical. 

In sum, the above difficulties could lead to 
an easy prediction of failure for Chapter 1 
program improvement. Yet while adequate 
national studies of program improvement 
have not been funded, we are inclined to 
believe that in most schools and most states 
program improvement is working. 

In a previous article, we reviewed four 
studies which indicated that schools which 
participate in a year-long guided process of 
planning and implementing Chapter 1 pro- 
gram improvement tend to begin showing 
achievement gain the following year (String- 
field, Billig, & Davis, 1991). One topic which 
was not central to those studies but which has 
become central' to the ongoing debate over 
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the Hawkins-Stafford Amendments * con- 
cerns the actions and perceptions of staff 
members in schools targeted for program im- 
provement. It is important to know what lo- 
cal teachers and principals are doing and 
thinking once they are identified for program 
improvement. If they respond defensively 
because they see no practical options for im- 
proving students' learning, then program im- 
provement will fail. If most see genuine steps 
they can take to improve their programs and 
get themselves "untargeted," and they are 
taking those steps, then program improve- 
ment may succeed. 

Local Educators' Perceptions of the Effects 
of Chapter 1 Program Improvement 

To determine local educators' responses to 
being targeted for program improvement, 
questionnaires were sent to the principals in 
all schools identified for Chapter 1 program 
improvement in three states. One of the 
states is located in the South, one in the Mid- 
west, and one in the Southwest. Over 200 
questionnaires were sent, and responses 
were received from just over 52% of the 
schools surveyed. It is possible that this re- 
sponse rate resulted in a biased sample, but 
the direction of any bias is not clear to the 
researchers. The respondents were not al- 
ways principals. In 15% of the responses, 
questionnaires were completed by Chapter 1 
coordinators, teachers, curriculum devel- 
opers, and other staff. In several cases, the 
entire school's improvement team completed 
copies of the questionnaire. When more than 
one respondent from a school answered a 
survey, the answers were combined so that all 
schools would receive equal weighting. 

Questions were open ended, and many re- 
spondents provided multiple answers to sin- 
gle questions. This resulted in a rich data set, 
but one that Jid not lend itself to quantitative 
analyses. Results will be presented as they 
relate to six overriding issues: Did the partici- 
pants understand why they had been identi- 
fied? Had they actually made changes? What 
factors did educators perceive to be inhibi- 
ting change? Facilitating change? Was evi- 
dence of outcomes available? Finally, the 
questionnaire asked educators whether the 
program improvement legislation was having 
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a positive or negative effect on schools and 
schooling. The first five questions will be ad- 
dressed briefly, followed by a more detailed 
discussion of the sixth. 

Did the participants understand why they 
had been identified? Yes. Over 90% of the 
respondents stated that their school had been 
identified because their Chapter 1 students 
had not shown sufficient gains on norm-refer- 
enced tests (NRTs). In a few cases, schools 
had volunteered for program improvement, 
reporting that they believed the process 
would be a healthy one for their school. 

Had schools actually made changes? This 
question was complicated by issues of timing. 
The schools had been identified on the basis 
of 1989-1990 achievement data (the first year 
of implementation), and questionnaires were 
sent during the spring of 1991. Hawkins- 
Stafford requires that by the end of the first 
year after identification, schools produce a 
plan of action. Neither the law nor subse- 
quent regulations require that schools fully 
implement their plans in year one. 

Over two-thirds of the schools had begun 
implementation. The most frequent excep- 
tions were schools preparing to move to com- 
puter-based provision of Chapter 1 services. 
At several of those sites, the schools were 
awaiting the delivery of hardware and soft- 
ware, and the provision of training. Among 
the schools which had begun implementa- 
tion, no clear patterns were apparent from 
the responses. Although no single type of 
change predominates, it is clear that in this 
sample there was no wholesale movement 
toward either test-driven instruction or blam- 
ing the measurement tool for the purported 
lack of success. Some schools which had re- 
lied on pullout programs were moving to in- 
class, some in-class were moving to com- 
puter-assisted instruction (CAI), and other 
than those moving to CAI, none indicated a 
shift to a nationally recognized program 
model, such as Reading Recovery or Accel- 
erated Schools. 

What factors did educators perceive to be 
inhibiting or facilitating change? An interest- 
ing finding concerned the role of each state 
department of education in structuring the 
program improvement processes. In two of 
the three states, the state Chapter 1 directors 
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had opted to use their regional Chapter 1 
Technical Assistance Center (TAC) and/or 
Rural Technical Assistance Center (R-TAC) 
as part of a year-long program improvement 
planning process. This resulted in increased 
reports of collaboration among staffs and a 
wide variety of program changes. The third 
state's Chapter 1 director had opted to en- 
courage individual schools and districts to 
use a self-assessment questionnaire based on 
the "Thirteen Attributes of Effective Com- 
pensatory Education Programs" (Griswold, 
Cotton, & Hansen, 1986). Literal interpreta- 
tions of data gathered on that instrument 
have been found to suggest that "parent in- 
volvement" and "coordination among pro- 
grams" are the two greatest needs in Chapter 
1 programs (Davis & Billig, 1989). Not sur- 
prisingly, the majority of projects in the third 
state focused their improvement efforts on 
parent involvement and increased coordina- 
tion. 

The most commonly reported answer to a 
question on factors inhibiting change was 
"none." Second was a perceived lack of sup- 
port. Some of the teacher respondents per- 
ceived that their principals were not open, 
concerned, or involved in the process. Some 
of the principals saw a rigid central office 
staff at the center of their problems. Others 
reported that the lack of funds, difficulties in 
scheduling, problems with physical space, 
and the resistance of some professionals were 
difficulties. 

The most commonly reported answers to a 
question on factors facilitating change in- 
cluded the commitment of the whole school's 
faculty to school improvement, collaborative 
problem solving, and administrative support 
and monitoring. 

Was evidence of outcomes available? Given 
that virtually all of the schools were in pro- 
gram improvement because of lack of gains 
on NRTs and that in most districts the ques- 
tionnaire was received before the next year's 
test data were available, it was not surprising 
that most respondents reported that it was 
"too early to tell" whether improvements 
had been successful. A few schools were able 
to report achievement gains, and several 
schools noted that parents seemed more in- 
volved and more pleased with Chapter 1. 



Others noted an increase in staff involvement 
in and ownership of Chapter 1, and one re- 
ported students reading more widely as evi- 
dence of success. 

Toward the end of the questionnaire, 
school personnel were asked, 4 'Do you think 
that the Chapter 1 program improvement leg- 
islation has had a positive or negative impact 
on Chapter 1 as a whole? Why?" Fully 84% of 
the respondents stated that the program im- 
provement statutes were having a positive 
effect on Chapter 1. Responses v aried from 
/dentifying with the idealism of *he legisla- 
tion ("We feel that the impact of program 
improvement is positive because school im- 
provement is a vital part of the educational 
process. We should strive for the 
best. . . . ") to a rather conservative prag- 
matism ("The legislation had a positive im- 
pact, making schools more accountable. It 
has really helped in this district because the 
ownership for the Chapter 1 program re- 
turned to the school level . . . "). Teachers 
and principals reported being more aware of 
the needs of students, the needs of their 
school, and of the options available to them. 

Five percent stated that the effect of the 
legislation was negative. Persons expressing 
negative views included principals and 
teachers. The most frequent reason given for 
a negative response was a questioning of the 
validity or relevance of the NRTs. One Chap- 
ter 1 teacher stated a resentment that Chap- 
ter 1 teachers "are held totally accountable 
for a student's improvement, where the class- 
room teacher isn't." 

Eleven percent of the respondents stated 
that the effect was neutral or that it was too 
early to tell. One principal observed that as 
long as control of Chapter 1 remained cen- 
tralized at the district office, the program's 
potential for effect was fixed and the law's 
effects were insignificant. 

Discussion and Implications 

Much of the public debate regarding the 
program improvement sections of the 
Hawkins- Stafford Amendments concerns 
the assumption of negative perceptions re- 
sulting from being "targeted" for program 
improvement. Our study indicates that it is 
possible to structure program improvement 
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processes so that the large majorit; of local 
educators, even those who have bven tar- 
geted, are supportive of the program im- 
provement goals and processes! If a potential 
stigma on adults can be turned into produc- 
tive change, then the debate on Chapter 1 
program improvement can move forward to 
more child-focused issues. This requires pa- 
tience, wisdom, and firm but gentle guidance 
from several levels of bureaucracy above the 
teacher and school. 

A second implication concerns staff devel- 
opment, coordination, and buy-in. Many 
regular classroom teachers and principals in 
Chapter 1 schools were unaware that their 
schools had become accountable for the mea- 
sured academic growth of their poorest per- 
forming students. Most had not known what 
an NCE was, much less felt responsible for 
the production of more of them. Often they 
felt unconnected to the processes and out- 
comes of compensatory education. These 
are staff development problems. The cur- 
rent research gives some cause for optim- 
ism regarding the salutary effects of cross- 
program, often schoolwide staff develop- 
ment. Such staff development efforts can 
improve both coordination and regular 
classroom teachers' buy-in to the goals and 
processes of Chapter 1. 

A third finding concerned the handling of 
program improvement by the states. The 
three states involved in our survey had used 
moderately differing program improvement 
processes and had reaped differing results. A 
very large natural experiment in change is 
happening in Chapter 1 today. It is impor- 
tant, but is not being thoroughly researched. 
Our results indicate that state-level differ- 
ences m process may be producing consider- 
able differences in local educators' percep- 
tions of the law, local options, and the value 
of Chapter 1. 

A fourth issue which emerged in our work 
with various states and in analyses of the data 
concerns Chapter 1 evaluations. The Volume 
1, Number 2, of Educational Evaluation and 
Policy Analysis (1979) was devoted to techni- 
cal problems surrounding the Title I Evalua- 
tion and Reporting System (TIERS), and lit- 
tle technical work has been conducted on the 
area since. This is in spite of the fact that 
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significant breakthroughs have been made in 
the areas of testing and, importantly, in the 
technical requirements for measuring change 
(e.g., Raudenbush & Bryk, 1989; Rogosa, 
1989; Willett, 1989). One clear implication of 
this later research is the need for "three 
points in time" for measuring gain. Require- 
ments for "targeted" program improvement 
need to be based on the strongest possible 
evidence. Both technical issues and evalua- 
tion use issues in Chapter 1 need new atten- 
tion. 

There is a great need for more solid re- 
search on practical options for program im- 
provement. It is not enough to tell programs 
that they must improve . If the federal govern- 
ment is to mandate change, then it should 
also provide a considerable list of previously 
researched and independently evaluated op- 
tions for change. These should be accom- 
panied with a matrix of conditions under 
which various programs might be more or 
less viable choices. The development of such 
knowledge would require a great deal of addi- 
tional research. As the accounting firm of Ar- 
thur Anderson & Co. has observed, any indus- 
try which spends as small a percentage of its 
total operating budget on research as does edu- 
cation would soon find itself hopelessly out- 
stripped by its competitors. Moreover, the logi- 
cal level for funding of educational research is 
federal (Measelle & Egol, 1990). 

If program improvement strategies are to 
be fully implemented, Congress will have to 
allocate much more money for state and local 
support of change. Under Hawkins-Stafford 
states receive $90,000 per year to facilitate 
program improvement. This is an inadequate 
amount for Wyoming's needs and hardly 
worth mentioning in a state the size of Cali- 
fornia. Studies of change consistently find 
that change takes time, coordination, leader- 
ship, and multiyear support. 

We began this article with a long list of 
reasons why the program improvement re- 
quirements of the Hawkins-Stafford Amend- 
ments to Chapter 1 might have difficulty in 
being successfully implemented. The data 
from our and other studies indicate that care- 
fully implemented program improvement 
can become a force for more fully integrated, 
thoughtful educational programming and 
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higher achievements for disadvantaged stu- 
dents. The data from our modest sample also 
indicate that local educators can overcome 
concerns regarding being "targeted" for pro- 
gram improvement and can focus on the im- 
portant issues of providing the best possible 
services to children. The primary challenge 
facing the authors of the next reauthorization 
of Chapter 1 will be in providing sensible 
policies, research, and funding to support 
enhanced program improvement. 
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statements and philosophical arguments; (3) critical syntheses of a field of educa- 
tional inquiry; and (4) integrations of educational scholarship, policy, and prac- 
tice. For additional information contact: Philip W. Jackson, Editor, American 
Journal of Education, 5835 Kimbark Avenue, Chicago, Illinois 60637. 



American Sociological Review is the official journal of the American 
Sociological Association and the leading journal in sociology. It publishes 
original (not previously published) works of interest to the discipline in general, 
about new theoretical developments, results of research that advance our under- 
standing of fundamental social processes, and important methodological innova- 
tions. All areas of sociology arc welcome. Emphasis is on exceptional quality and 
general interest. For additional information contact: Paula S. England, Editor, 
American Sociological Review, Department of Sociology, University of Arizona, 
Tucson, Arizona 85721 . 



Educational Evaluation and Policy Analysis (EEPA) focuses on 

educational evaluation, educational policy analysis, and the relationship between 
the two activities. It strives to serve the multiple needs of the diverse specialists 
currently working in educational evaluation and policy analysis. EEPA deals not 
only with theoretical and methodological issues but also with the intensely practi- 
cal concerns of individuals engaged in the evaluation of educational enterprises 
and the formulation of educational policy. It apprises readers of current develop- 
ments in the emerging educational specializations of evaluation and policy 
analysis. For additional information contact: American Educational Research As- 
sociation, Publications Department, 1230 17th Street NW, Washington, DC 
20036-3078. 



Journal of Educational Psychology publishes original investigations 

dealing with learning and cognition, social and emotional processes, and human 
development as they relate to problems of instruction. Journal articles pertain to 
all levels of education and to all age groups. For additional information, contact: 
Joel R. Levin, Editor, Department of Educational Psychology, University of Wis- 
consin, 1025 West Johnson Street, Madison, Wisconsin 53706-1796. 
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Reading Research Quarterly is published four times a year by the Interna- 
tional Reading Association as a service to its members and other interested per- 
sons, and is intended to provide a forum for the exchange of information and 
opinion on theory, research, and practice in reading. For additional information 
contact: Reading Research Quarterly, The Ohio State University, 21 OA Arps Hall, 
1945 North High Street, Columbus, phio 43210-1177. 



Review of Educational Research (RER) publishes critical, integrative 
reviews of research literature bearing on education. Such reviews should include 
conceptualizations, interpretations, and syntheses of literature and scholarly work 
in a field. RER encourages the submission of research relevant to education from 
any discipline, such as reviews of research in psychology, sociology, history, 
philosophy, political science, economics, computer science, statistics, anthropol- 
ogy, and biology, provided that the review bears on educational issues. RER docs 
not publish original empirical research unless it is incorporated in a broader in- 
tegrative review. RER will occasionally publish solicited, but carefully rcfcrccd, 
analytical reviews of special topics, particularly from disciplines infrequently rep- 
resented. For additional information contact: American Educational Research As- 
sociation, Publications Department, 1230 17th Street NW, Washington, DC 
20036-3078. 



Urban Education is a publication of Corwin Press, a subsidiary of Sage Pub- 
lications, Inc. The journal, publishing rcfereed manuscripts quarterly, is the 
premier journal addressing issues related to the education of urban youth. Once a 
year, Urban Education offers a special Issue focusing on one facet of education 
in the inner city. The journal not only looks at the problems in urban schools; it 
also offers solutions, in part by highlighting successful programs in the inner 
cities. In addition to publishing state-of-the-art research and conceptual articles in 
each issue, Urban Education regularly features essay reviews of current publica- 
tions in the area of schooling in urban centers. For additional information contact: 
Dr. Kofi Lomotey, Editor, Urban Education, 111 Peabody Hall, College of Educa- 
tion, Louisiana State University, Baton Rouge, Louisiana, 70803. 
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